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In planning the vocal training of children it would be helpful to 
know what are the capabilities of children at different age levels, how 
much improvement one might expect to accomplish through training, 
and what methods of training are most profitable. The aim in this 
paper is to present some findings on these questions, obtained through 


tests of children and adults and through experimental work with 
groups of children. 


AGE DIFFERENCES IN VOCAL RANGE 


The data in this study consist in part of the results obtained in 
tests of vocal reproduction of pitch administered to four hundred seven 
children, aged two to ten years, and to sixty-five adults. 
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The tests were designed to obtain a record of the particular tones 
each subject could sing. The scoring was based upon the reproduction 
of the individual tones and half-tones corresponding to the C major 
scale. Each tone or half-tone correctly reproduced added a tally of 
one to the subject’s score. 

In administering the test, the tester would sound each tone on a 
piano (or a xylophone or psaltery in the case of some of the younger 
children), sing the tone, and ask the subject to reproduce it. Credit 
was given only for the reproduction of the particular tones that were 
presented. As mentioned above, the scoring was based only upon 
the reproduction of tones on the C major scale, but in prompting the 
child to sing during the course of the tests the notes presented to 
him were not limited to this scale. 

The scoring of the subject’s performance depended upon the judg- 
ment of the experimenter. In the case of each tone the subject tried 
to sing, the experimenter judged whether this particular tone had been 
reproduced. 

It can be seen at once that the crucial issue arises as to whether 
valid measurements can be obtained by this method. To obtain 
completely objective and accurate measurements of the pitch of a 
subject’s singing it would, of course, be necessary to use mechanical 
devices. This truth was recognized by the authors at the beginning 
of the study, but it was decided to experiment with the possibility 
of obtaining records that might have practical value without the 
use of apparatus. The procedure of scoring on the basis of the experi- 
menter’s judgments, it can be seen, corresponds somewhat to the test 
which a pupil meets when receiving vocal instruction and which even 
the professional singer must face in a practical audition. Insofar 
as the teacher, the critical auditor, or the experimenter represent 
the standards by which a singer’s performance will normally be 
judged, their judgments will have practical validity even though they 
might deviate somewhat from the standards of accuracy that could 
be applied if mechanical measuring instruments were used. 

To determine the validity of the experimenter’s judgments, within 
the limitations of this method, several tests were applied. Some of 
these tests are described in an earlier publication. While tests of 
children were in progress, other musically trained observers were 





1 Jersild, A. T. and Bienstock, S. F.: ‘‘The Influence of Training on the Vocal 
Ability of Three-year-old Children.” Child Development, Vol. II, 1931, pp. 
272-291. 
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introduced. ‘These observers and the experimenter took independent 
records of their judgments of the children’s performance, and, at 
the end of the test series, item by item comparisons were made between 
judgments they had recorded. This procedure was used once in a 
test of twenty children aged three years, again in a test of ten three- 
year-old children, again in tests of twenty-three children, aged 3 to 
8% years, and again in tests of thirty-five children aged nine and ten 
years. All told, the experimenter’s judgments were checked against 
the judgments of six musically trained individuals, two of whom 
possessed what has been known as “absolute pitch.” 

When item by item comparisons were made between the experi- 
menter’s and the other judges’ records, the agreement in the various 
test series ranged from eighty-four to ninety-six per cent, with an 
average of ninety-two per cent. These figures indicate that the 
experimenter’s judgments were quite accurate insofar as agreement 
between independent judges gives an indication of accuracy. 

Each subject was tested three times. The three tests were 
administered on as many separate days, except in the case of twenty 
children who received two of their three tests on one day with an hour 
or more intervening between the two tests. The procedure of giving 
each child three separate tests made it possible to obtain an indication 
of the consistency of the scores. The correlation between the scores 
obtained in the second and third tests, calculated at each age level, 
ranged from .76 to .97, with an average coefficient of .87.1 (The 
records of the first test, which was designed largely to acquaint 
the child with the procedure and to win his cooperation, were not 
tabulated.) 

As mentioned above, the subject’s score represented a tally of the 
number of tones he was able to reproduce. The aim was primarily 
to test execution rather than discrimination. The subject was 
simply asked to sing tones, rather than to indicate by some other 
sign his ability to appreciate the difference between tones. The 
criterion was the pitch and not the quality of tone that was sung. 
In the case of tones at the extremes of his vocal range, the subject 





1 The correlations between tests two and three at each age level were as follows: 
-76 + .05 at two years; .83 + .03 at three years; .90 + .02 at four; .82 + .03 at 
five; .86 + .03 at six; .89 + .02 at seven; .97 + .01 at eight; .85 + .03 at nine; 


-91 + .02 at ten years. The number of subjects at each age level is given in 
Table I. 
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The tests were designed to obtain a record of the particular tones 
each subject could sing. The scoring was based upon the reproduction 
of the individual tones and half-tones corresponding to the C major 
scale. Each tone or half-tone correctly reproduced added a tally of 
one to the subject’s score. 

In administering the test, the tester would sound each tone on a 
piano (or a xylophone or psaltery in the case of some of the younger 
children), sing the tone, and ask the subject to reproduce it. Credit 
was given only for the reproduction of the particular tones that were 
presented. As mentioned above, the scoring was based only upon 
the reproduction of tones on the C major scale, but in prompting the 
child to sing during the course of the tests the notes presented to 
him were not limited to this scale. 

The scoring of the subject’s performance depended upon the judg- 
ment of the experimenter. In the case of each tone the subject tried 
to sing, the experimenter judged whether this particular tone had been 
reproduced. 

It can be seen at once that the crucial issue arises as to whether 
valid measurements can be obtained by this method. To obtain 
completely objective and accurate measurements of the pitch of a 
subject’s singing it would, of course, be necessary to use mechanical 
devices. This truth was recognized by the authors at the beginning 
of the study, but it was decided to experiment with the possibility 
of obtaining records that might have practical value without the 
use of apparatus. The procedure of scoring on the basis of the experi- 
menter’s judgments, it can be seen, corresponds somewhat to the test 
which a pupil meets when receiving vocal instruction and which even 
the professional singer must face in a practical audition. Insofar 
as the teacher, the critical auditor, or the experimenter represent 
the standards by which a singer’s performance will normally be 
judged, their judgments will have practical validity even though they 
might deviate somewhat from the standards of accuracy that could 
be applied if mechanical measuring instruments were used. 

To determine the validity of the experimenter’s judgments, within 
the limitations of this method, several tests were applied. Some of 
these tests are described in an earlier publication.!_ While tests of 
children were in progress, other musically trained observers were 





1 Jersild, A. T. and Bienstock, S. F.: ‘The Influence of Training on the Vocal 
Ability of Three-year-old Children.” Child Development, Vol. II, 1931, pp. 
272-291. 
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introduced. These observers and the experimenter took independent 
records of their judgments of the children’s performance, and, at 
the end of the test series, item by item comparisons were made between 
judgments they had recorded. This procedure was used once in a 
test of twenty children aged three years, again in a test of ten three- 
year-old children, again in tests of twenty-three children, aged 3 to 
814 years, and again in tests of thirty-five children aged nine and ten 
years. All told, the experimenter’s judgments were checked against 
the judgments of six musically trained individuals, two of whom 
possessed what has been known as “absolute pitch.” 

When item by item comparisons were made between the experi- 
menter’s and the other judges’ records, the agreement in the various 
test series ranged from eighty-four to ninety-six per cent, with an 
average of ninety-two per cent. These figures indicate that the 
experimenter’s judgments were quite accurate insofar as agreement 
between independent judges gives an indication of accuracy. 

Each subject was tested three times. The three tests were 
administered on as many separate days, except in the case of twenty 
children who received two of their three tests on one day with an hour 
or more intervening between the two tests. The procedure of giving 
each child three separate tests made it possible to obtain an indication 
of the consistency of the scores. The correlation between the scores 
obtained in the second and third tests, calculated at each age level, 
ranged from .76 to .97, with an average coefficient of .87.1 (The 
records of the first test, which was designed largely to acquaint 
the child with the procedure and to win his cooperation, were not 
tabulated.) 

As mentioned above, the subject’s score represented a tally of the 
number of tones he was able to reproduce. The aim was primarily 
to test execution rather than discrimination. The subject was 
simply asked to sing tones, rather than to indicate by some other 
sign his ability to appreciate the difference between tones. The 
criterion was the pitch and not the quality of tone that was sung. 
In the case of tones at the extremes of his vocal range, the subject 





1 The correlations between tests two and three at each age level were as follows: 
.76 + .05 at two years; .83 + .03 at three years; .90 + .02 at four; .82 + .03 at 
five; .86 + .03 at six; .89 + .02 at seven; .97 + .01 at eight; .85 + .03 at nine; 


91 + .02 at ten years. The number of subjects at each age level is given in 
Table I. 
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received credit if, in the judgment of the experimenter, he temporarily 
produced the pitch of the tone that was presented even though he 
might not be able to sustain this pitch for any length of time. More- 
over, since credit was given on the basis of individual tones, a subject’s 
score would not indicate how well he might be able to use them or 
incorporate them into a song. This is particularly true of the tones 
at the extremes of the subject’s vocal range. 

The children were usually brought to be tested in groups of two to 
four, but occasionally as many as eight were brought to the testing 
room at one time. Although each child’s record was based solely 
upon his performance when he sang alone, a good deal of informality 
and variety was introduced into the tests, and efforts were made to 
encourage the children to be at ease and to express themselves freely. 
If a child appeared to be ill at ease when first asked to sing he was 
given an opportunity some minutes later, after other children had 
had their turn. After the first test series had been completed, the 
procedure in the second and third series was to include in each group 
at least one child who had previously responded well and had made 
a relatively high score. This child was chosen as the first one within 
the group to receive the individual test. While he sang, the others 
listened. 

During the test, the tester faced the child. An effort was first 
made to find a tone which the child could reproduce with ease and 
then to work up and down from this tone. When the child began to 
have difficulty and appeared to have reached the upper or lower 
extreme of his vocal range, the tester continued to encourage the 
child’s efforts by sounding and singing the tone several times (a 
maximum of eight) and also led the child to approach the tone from 
tones that were well within his range. If the child was still unsuccess- 
ful, the tester would present tones quite beyond the range of anything 
hitherto reproduced by the subject in an attempt to stimulate him 
to shift to “head tones” or ‘chest tones” as the case might be. It 
was noticed that sometimes a child whose range at first seemed quite 
limited would reproduce several additional tones after this device 
had been used to encourage him to re-adjust his voice to the singing 
of high or low tones. 

During the tests, words of encouragement and praise were used 
quite liberally and apparently with salutary effect, although occa- 
sionally it was helpful to use a commanding tone of voice to lead the 
child to listen more carefully and to desist from singing at random. 
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The children who were tested were obtained from schools, kinder- 
gartens, nursery schools and day nurseries; approximately ninety 
per cent of the children, between the ages of five and ten, were enrolled 
in public schools and kindergartens, drawn almost entirely from two 
schools, one of which was located in a relatively poor district in the 
City of New York, and the other in a district that was slightly above 
the average in general socio-economic status. The children of pre- 
school age were about equally divided between private nursery schools 
and day nurseries supported by charity or public funds. An effort 
was made to obtain a sampling as representative as possible with 
respect to intelligence; intelligence test scores and age-grade place- 
ment were taken into account in selecting subjects. Intelligence test 
scores were not available in all cases, however; the writers estimate 
that the subjects include a somewhat larger number of children with 
IQ’s above 100 than below, and that the average IQ of all children in 
the study would be about 110. 

Results of Singing Tests.—Table I shows the mean and the median 
number of tones sung by boys and girls at yearly and at half-yearly 
age levels, as well as the scores of adults. The figures are based upon 
the scores obtained in the last of the three tests administered to each 
subject. 

In Figs. 1 and 2 the changes with age in pitch range are shown 
in graphic form. Figure 1 represents the average scores of all subjects 
and the separate averages of boys and girls. Figure 2 shows the 
corresponding median scores, with the addition of the twenty-fifth 
and seventy-fifth percentile scores. 

As indicated by the data given in Table I, the scores at each age 
level cover a wide range; it can be seen likewise that there is frequently 
a very wide spread between the twenty-fifth and seventy-fifth per- 
centile scores. 

Due to the wide distribution of the scores at each age level, neither 
the average nor the median score represents typical children. In 
Table II the scores of all subjects at all age levels are reproduced. 

The averages in Table I show an increase at each age from two to 
nine years, with a nominal decline at the age of ten. The median 
scores likewise rise with age, although less gradually than the averages. 
The average and the median scores at half-yearly intervals, as well 
as the corresponding scores of boys and girls, show several irregu- 
larities, due largely, no doubt, to the limited number of cases in these 
sub-groupings. 
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Fic. 1.—Average number of tones sung by all children, by boys and girls‘at different 


age levels, and by adults. 
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Fig. 2.—Median and upper and lower quartile scores of all children at different age levels 
and of adults, and median scores of males and females at different age levels. 
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Perhaps the most impressive feature of Table I is the evidence of 
rapid development of ability to sing a wide range of tones. The 
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LEVELS 
1 
1 
1 1 1 
1 
1 2 
1 1 
2 1 2 
2 1 1 
1 1 2 
1 2 9 
1 2 1 9 
1 2 1 8 
1 3 2 2 2 4 7 
2 2 5 4 6 5 4 8 
1 2 1 1 4 3 7 4 5 8 
1 1 5 2 2 4 6 7 
3 3 5 4 6 3 4 3 
4 1 7 7 1 4 2 2 
2 2 1 1 3 5 2 3 2 1 
3 2 5 3 2 5 
1 2 3 6 2 4 3 1 
2 1 5 1 1 7 4 1 2 
2 3 4 4 3 2 1 2 3 
1 4 2 4 3 2 
1 2 7 1 1 2 
5 6 6 3 1 2 1 
2 3 2 4 3 2 
3 5 4 3 1 1 
5 1 1 
6 5 1 3 
3 2 1 
3 1 





Age 2yrs. 3 yrs. 4 yrs. 5 yrs. 6 yrs. 7 yrs. 8 yrs. 9 yrs. 10 yrs. Adults 
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1 The median score at each age level is italicized. 
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improvement from the age of two to six years is decidedly greater 
than the improvement from six years to maturity; this must be quali- 
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fied by the observation that during the latter interval the boys lose 
some of their high tones. When boys are omitted from the compari- 
son, the evidence of rapid development of vocal range stands out 
even more sharply. Further indications of the early development 
of the child’s pitch range appear in the figures showing high and low 
individual scores at each age level. At the age of four years, for 
example, one child sang a total of 22 tones, which is a larger score 
than that achieved by the average adult. This score does not mean, 
to be sure, that the child would be able to use all of these tones effec- 
tively in a song or that he could sing all of the tones with equal ease. 


But the scores of the exceptionally capable children as well as the scores" 


of average children all indicate that a child has the capacity to produce 
a wide range of tones at an early age. 

Further comparisons between individuals at different age levels 
follow. When children are grouped according to the respective age 
intervals of eighteen months, two years and three years, the following 
scores appear: 





Age in months.. se ee ee ee «+ |24—41/42-59/60-77|78-95|96—113/114—-131) Adults 
Number of cases. iit ..| 49 | 69 | 78 | 78 67 66 65 
Average number of tones eung. .| 6.1 | 8.6 |10.9)14.2] 15.2 | 15.7 | 19.7 
Median number of tones sung..| 5.0 | 8.0 |10.5/14.0| 16.0 | 16.0 | 20.0 












































Age in years. . pééseesveccocscl a | Gan. Oe i ie 10 Adults 
Number of cases. ed 7 97 104 89 44 65 
Average number of tones sung. mies 4 6.3 | 9.9 | 18.8 | 15.4] 15.8| 19.7 
Median number of tones sung..... 5.0 | 9.0 | 13.0 | 16.0 | 16.0 20 
Age in years. . Vivendi esas ubaetce 6 Suk, ae 5-7 8-10 Adults 
Number of cases. ces <P eeteee. A | 156 133 65 
Average number of tones sung.. ror, om f 12.5 15.4 19.7 
Median number of tones sung. Se 12.0 16.0 20 

















According to the present results, a person realizes a large portion 
of his potential pitch range while he is still in the first three grades in 
the elementary school. This seems to hold true especially in the case of 
girls. It would appear that this observation has important implica- 
tions. From the point of view of the musical education of children, 
the present findings suggest that it would be profitable to emphasize 
vocal training in the lower school grades. Through such training the 
child might be led to capitalize upon his ability and to acquire skill in 
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the use of his voice. Moreover, emphasis on training at this early age 
might help to prevent the formation of habits of disuse which might 
make it difficult for the child to realize his potential skill in later years. 

Sex Differences.—The findings represented in Table I and in Figures 
1 and 2 indicate that girls sang a somewhat larger number of tones than 
did the boys at several age levels. At no age, however, is the difference 
between the average scores of boys and girls three times its standard 
error. The reliability of the difference between the boys’ and girls’ 
average at each age level from five to ten years ranges from .78 to 
1.35. When all scores are considered, from the age of two to ten years 
inclusive, the girls have an average score of 12.7 as compared with an 
average of 11.4 forthe boys. The reliability of the difference between 
the two averages (Diff./odif) is 2.21. Although the rather consistent 
superiority of the girls seems to indicate that there is a sex difference 
in their favor, the findings are not conclusive on this point.! Further 
comparisons between boys and girls will be given in a later table. 

Tones Sung by Children at Each Age——More important, perhaps, 
than the number of tones sung by a child at a particular age is the 
question as to what these particular tones are. To study this question, 
a tabulation was made of the number of subjects who reproduced each 
tone. These figures, in turn, were converted into percentages to show 
the relative score of each tone. Table III which is based on this com- 
pilation, shows the tones reproduced by 50 per cent or more of the 
subjects at each age. 

According to the results shown in Table III, children have acquired 
a relatively wide tonal repertory by the age of six years. Both boys 
and girls acquire deeper tones between the ages of ten years and matur- 
ity. Girls retain the ability to reproduce the high tones which were a 
part of their repertoire at the age of ten, while, as is already well 
known, boys lose the ability to reproduce some of the higher tones. 
To provide an adequate picture of the voice changes that take place 
between pre-adolescence and maturity would require a larger number 
of cases than is represented here, as well as data from tests of children 
at various age levels above ten years. 

In an earlier study of smaller groups of children the authors 


. presented findings which indicated that children seemed to be able to 





1 Evidence that girls may be slightly superior to boys in pitch discrimination, 
as measured by the Kwalwasser-Dykema Music Tests, is presented by J. Kwal- 
wasser (Problems in Public School Music, M. Witmark and Sons, N. Y., 1932 
pp. 159). 
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sing tones lower in pitch than the limits that have been set forth in 
some manuals dealing with the musical training of children. The 
present results are consistent with these earlier findings. The claim 
has been made, for example, that songs for young children should 
include tones falling within the range from E first line to E fourth space. 
In contrast with this, the data in Table III indicate that at all ages, 
from two years and on, fifty per cent of the children were able to sing 
one or more tones below E first line. E fourth space was not sung by 
fifty per cent of the children until the age of six years. Moreover, at 
all ages the children were able to sing a greater number of tones below 
first line E than above fourth space E. 


Tas LE II].—Tones Sune By Firry Per Cent or More oF THE Sussects aT Each 
Acre Levet. THe FIGURES FoR THE ENTIRE Group, as WELL AS SEPARATE 
FicurEs For Boys anp Giris, ARE GIVEN aT THE AGE OF TEN YEARS. 
Mippte C Is Itauicizzep. A Few Tones REPRODUCED BY SLIGHTLY 
Less THAN Firty Per CENT OF THE CHILDREN ARE SHOWN IN 


PARENTHESES 
Aap Tones 

DO cccckbanecvesesee DEFGA 
Piiiecvcendecdadswsees CDE F G(A)! 

GLAS F636dedsse0s aoe BCDEFGABC 
Re ee ee ABCDEFGABCD 
thet ee keine df ABCDEFGABCDEFG 
Peabo cesanstéséecuaes ABCDEFGABCDE FG)! 
0 ee GABCDEFGABCDEFG 
Di hisksh cedceaiedme FGABCDEFGABCDEFG 
PEL a Sic atcdcudesces (F)GABCDEFGABCDEFG 

Ct wes bab GU ben Se eeN< fgabcdefgabcodefg 

A enictincshwhepoeie gabcdefgabcdefga 

de i teen ain (DEFGABCDEFGABCDEFGABC 

0 CDEFGABCDEFGABCDEFGA 


1 Sung by forty-nine per cent of the children. 
2 Sung by forty-eight per cent of the children. 
* Sung by forty-eight per cent of the children. 
‘ Sung by forty-nine per cent of the men. 

5 Includes the falsetto. 


Limited data obtained in another approach to the study of chil- 
dren’s singing are not entirely consistent, however, with this finding. 
In the study by the writers previously referred to, an effort was made to 
obtain records of the tones most frequently used by children in their 
chanting and humming during spontaneous play. Obviously, it is a 
difficult thing to obtain accurate records of vocalizations of this sort, 
and for this reason no claim can be made as to the reliability of the 
records. However, the results, such as they are, are of interest in 
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comparison with the results obtained under more formal test conditions 


i In order to obtain the records, the experimenter devoted one hundred em 
H minutes of observation to each of eighteen children, aged thirty-one to lat 
i forty-eight months, with an average age of thirty-eight months. Each 
child was followed as unobtrusively as possible, and each time the child to 
sang, hummed or chanted, the experimenter tried to identify the tones tw 
| by means of a pitch pipe, and then made a written record of the items. age 
Since the children were accustomed to being studied by observers in age 
| other investigations, this procedure was not quite as artificial as it bet 
1 might seem at first glance. th: 
d The tones identified by the observer during this procedure numbered m«¢ 
| | nine hundred twenty ; of this number, nine hundred fifteen, or ninety-nine an 
' per cent of the tones occurred within the range from first line E to fourth of 
space E. Only fifty per cent of the tones occurred within the range from thi 
Middle C to A (presented as the tentative norm for three-year-oldsin ff qu 
Table III) while fifty per cent of the tones were above A. Since the . dif 
method of obtaining the data on spontaneous singing was none too certain . ad 
} and since the figures do not include an equal number of vocalizations f[ 
i by all children but were influenced by the idiosyncracies of the children | au 
} who happened to sing most while the observations were in progress, the fol 
| limited data with respect to children’s spontaneous singing cannot be | no 
| given as much weight as the results of the more systematic tests. But ff te 
the findings do suggest that a child, when vocalizing during his free | ye 
play, may be likely to sing tones within his high more frequently than In 
within his low register. The observations of children’s spontaneous ) (re 
singing also suggest the possibility that the series of tones sung by de 
: fifty per cent of the children at each age level (as shown in Table III) Wi 
might include more high tones if records could have been obtained ar 
under completely spontaneous conditions. Even though every effort Ov 
was made to encourage the child to be at his ease during the administra- th 
tion of the pitch tests, and even though the policy of testing each D. 
child on three separate occasions should, presumably, help toward this it: 
end, it still is possible that some subjects did not succeed in “letting ch 
themselves go”’ and did not sing as wide a range of tones as they might as 
have sung if they had given vent to uninhibited, spontaneous song. = 
When this implied criticism has been made, however, the statements Pe 
with respect to children’s ability to sing low tones still are supported by 
the data.! Further comments with respect to the possible effect of re 





1In a study by Hattwick, M. 8.: ‘‘The Réle of Pitch Level and Pitch Range in 
the Singing of Preschool, First Grade and Second Grade Children.” Child 
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embarrassment on the singer’s performance will be introduced at a 
later point. 

Tests of the Singing of Tonal Intervals.—The findings with respect 
to the singing of intervals are limited to the results obtained in tests of 
two small groups of children. One group included forty-seven children 
aged thirty-one to forty-eight months, the other, twenty-three children 
aged 3 to 844 years. The results obtained with the former group have 
been reported in an earlier study already referred to. It was found 
that the closer intervals (7.e., the seconds and thirds) were reproduced 
more readily than the wider intervals (fourths and fifths). Ascending 
and decending intervals were reproduced almost equally well. In tests 
of the latter group, it was found that the major and minor seconds and 
thirds and the perfect fourths were reproduced somewhat more fre- 
quently than intervals wider than these, including the octave, but the 
differences were small and practically disappeared when retests were 
administered after a period of training. 

In an earlier report of the singing of three-year-old children, the 
authors presented findings that were at variance with the rule, set 
forth in some manuals, that the chromatic or half-step interval should 
not be stressed in songs for young children.! The results in subsequent 
tests and observations made in the study of the twenty-three 3 to 844 
year-old children mentioned above were in keeping with these findings. 
In this group, the score of the minor second or chromatic interval 
(representing the average number of children singing the ascending and 
descending interval, as measured by the judgment of the experimenter) 
was 22.5; the corresponding scores of the major second, major third 
and minor third intervals were, respectively, 23, 22, and 21.5. More- 
over, when the chromatic interval was included in songs introduced in 
the training of the seventeen oldest children in this group it was sung 


Development, Vol. IV, 1933, pp. 281-291, of ninety-five children aged 4% to 8 years, 
it was found that the mean pitch used by children when singing tones of their own 
choosing in an experimental situation was significantly lower than the pitch level 
assigned to the same songs as printed in songbooks for children of this age. In an 
unpublished study by 8. F. Bienstock it was found that seventeen 5 to 8% year 
old children were better able to incorporate tones below Middle C than tones above 
fourth line E into their songs. 

1 Following is a snatch of song sung by a three-year-old nursery school child as 
recorded during observations of children’s spontaneous singing: 


‘“‘Da-—-Da Da | Da | Da-Da | Da-Da | Da-Da”’ 
Ss 38 & C D C#;C C#i{D 
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by all but one child. The limited results in this as well as the earlier 
study suggest that the rule against the use of the chromatic interval in 
songs for young children is ill-advised and imposes an unnecessary 
restriction upon the material which might be used in children’s songs. 


THE EFFECT OF TRAINING ON CHILDREN’S TONAL RANGE 


Two studies, one of which has been reported elsewhere, consist of 
experiments conducted by the authors to investigate the effects of 
training on children’s ability to sing tones, intervals and songs.! 
The latter study involved thirty-six children, ranging in age from 
thirty-one to forty-eight months, with an average age of thirty-nine 
months at the beginning of the investigation. After initial tests of 
the children’s ability to reproduce tones and intervals had been admin- 
istered, the subjects were divided into two groups, matched as closely 
as possible with respect to singing scores and age. Thereupon training 
in the singing of tones and intervals was given to one of the groups, 
consisting of eighteen children. Each child received forty periods of 
practice of about ten minutes each, extended over a period of approxi- 
mately six months. The other group of children served as control 
subjects and received no practice from the experimenter during this 
time. At the end of the training period (late in the spring of the year) 
both groups were retested. Following this, no further work was done 
with the children until the ensuing fall when as many subjects as could 
be found were tested once more. 

The children who received practice made large gains. At the 
beginning of the study the authors did not anticipate the magnitude 
of these gains, and one unfortunate result of this was that the initial 
test covered a range of only eleven tones (from middle C to F inclu- 


‘ give). Soon after the beginning of training it was necessary to 


extend the range of tones. At the end of the training period the 
practiced subjects sang a reliably larger number of tones than did 
the control subjects, not only in tests of the eleven tones initially 
included in the experiment but also in tests covering a range of eighteen 
tones. When re-tests were administered in the ensuing fall, several 
months after the termination of practice, the trained children still 
retained a statistically reliable superiority over the control subjects.’ 
1 Jersild, A. T. and Bienstock, S. F.: Op. cit. 


2 Jersild, A. T. and associates: Training and Growth in the Development of 
Children. Child Development Monographs No. 10, 1932. 
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Further data concerning the effect of practice were obtained by one 
of the writers' in a study of an additional group of twenty-three 
children ranging in age from 3 to 84% years. Each child was tested 
individually four times, on as many separate days, at the beginning 
of the study and twice at the end. During the training period, the 
children were divided into three groups; one group included five 
children aged three and four years, another included eight children 
aged 5 to 644 years, and the third included ten children aged 64% to 
814 years. During the training periods, the experimenter worked 
with one group at a time. The average duration of the training 
period was twenty minutes in the case of the younger and thirty 
minutes in the case of the older children. A part of the training 
period was devoted to work with each group as a whole in singing 
songs and a part was devoted to individual children while the rest 
of the group listened. Some attention was given to drill in singing 
tones bordering on the extremes of each child’s vocal range. Apart 
from the initial and final tests, each group attended thirty-eight 
practice sessions distributed over a period of sixty days. 

When scores on the last of the four tests of tonal range given at the 
beginning of the study were compared with the scores on the final 
test given at the end of the experiment, the twenty-three subjects 
showed a gain of thirty-eight per cent (the average score on the last 
of the four tests given at the beginning of the study was 14.13 tones, 
as compared with an average of 19.48 tones in the test administered 
at the end of the study). The youngest group (five children, aged 
forty-one to fifty-six months) improved their scores from an average 
of 13.2 to 17.2 tones, a gain of about thirty per cent; the second group 
(eight children, aged sixty-three to seventy-nine months) improved 
from 13.6 to 18.5 tones, a gain of thirty-six per cent; the oldest group 
(ten children, aged eighty to one hundred four months) improved 
from 15 to 21.4 tones, a gain of forty-three per cent. All groups, 
it can be seen, gained as the result of practice. Since the number 
of subjects is small and more practice was given to the older children, 
no significance can be attached to the difference between the amount 
of the respective gains of each group. Due to a difference in the 
amount of preliminary practice given during initial tests, the greater 





1 Bienstock, S. F.: A study of the effect of training on children’s ability to sing. 
Unpublished. 
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concentration of the training periods, and the smaller amount of 
individual attention given to each child in this study as compared 
with the study of three-year-old children reviewed above, no direct 
comparison can be made between the results of the two studies. 

In view of the gains noted again in this limited study, the question 
arises as to how lasting such gains might be. It is possible that a 
child might improve considerably during a period of intensive training 
without maintaining this advantage unless his training were con- 
tinued indefinitely. In the study just described, no control group 
was available, but a meager indication of the permanency of the 
improvement can be obtained by indirect methods. A total of twelve 
of the twenty-three children described above were retested, without 
any intervening special training at the hands of the experimenter, 
after the elapse of an interval of two years from the time when the 
experiment had come to an end. Some indication of the effects of 
the training they had received might be obtained by comparing their 
scores with the tentative norms given in Table I. To accomplish 
this, each child’s ‘‘achievement quotient”’ in singing (child’s own score 
divided by the average score of like-sexed children of his own age) was 
calculated for each of the twelve children on the basis of his score and 
age (1) at the beginning of the experiment; (2) at the end of the period 
of training; and (3) at the time of the later retests. The respective 
average ‘‘achievement quotients’? (multiplied by 100, to make the 
figures whole numbers) at the beginning and end of the training period 
were 105 and 168. If the children had maintained the improvement 
effected by training, the average on the subsequent retest should 
correspond to the latter figure (if we assume that the achievement 
quotient normally remains constant) while, on the other hand, if 
the training had effected only a transitory gain, the average achieve- 
ment quotient on retests should decline to the level of the first test. 
The average achievement quotient actually found on the subsequent 
retests was 123. This suggests that the children still retained some 
of the benefit achieved through a brief period of special training, even 
after the elapse of two years, but the data are too meager, and the 
intervening conditions too poorly controlled, to permit a definite 
conclusion. ! 





1 Evidence that tonal discrimination improves with practice is reported in a 
study by Hissen, Irene: (‘‘A new approach to music for young children.” Child 
Development, Vol. IV, 1933, pp. 308-317) involving twenty-seven children aged 
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SOME PRACTICAL OBSERVATIONS WITH REGARD TO TRAINING IN SINGING 


The data reviewed above, as well as incidental observations made 
during the course of work with children, suggest a few additional 
comments. 

During experimental work with children, it was sometimes observed 
that a child who seemed unable or unwilling to sing a particular song 
would participate when the song was transposed to a higher or lower 
key. For example, a particular four-year-old child might remain 
silent when a song was sung in the key of F but make an attempt to 
sing when the same song was transposed down to the key of C, or, 
in another case, when transposed up to the key of B flat. In instances 
such as this, a record of a test of the child’s voice is likely to reveal 
that his voice is placed higher or lower than that of his average peer. 

Failure to take account of the voice range of a particular child may 
lead to a mistaken impression of his ability to sing. In some cases 
observed during the study, children who had been set down as incom- 
petent singers by their teachers not only made relatively high scores, 
consisting chiefly of tones lower than the tones usually included in 
the songs presented to them, but also improved considerably after a 
period of special training. 

Children undoubtedly differ in native potential ability to sing, and 
there is no reason to believe that all individuals would be equally 
competent if given equal encouragement and training. But even 
though this is recognized, it still may be true that many children do 
not learn to make use of their potential abilities due to lack of encour- 
agement and training. It has been observed that a child may become 
accustomed to being regarded as incapable of singing, and become 
resigned to regarding himself as unable to sing. He may remain in 
a situation analogous to that of the non-swimmer who has developed 
the habit of standing on the bank while others leap into the water 
and try to swim. If he once were helped to make the leap and to 
acquire the beginnings of skill he might eventually excel the others. 





twenty-one to fifty-four months at the beginning of the investigation. Improve- 
ment was noted also in accuracy in the reproduction of tones. Hissen further 
noted that during training the children gained in self confidenc in the use of their 


voices and had formed good listening habits which might carry through to later 
years. 
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An illustration of marked change effected by practice, appears 
in the case of a three-year-old girl. In her normal conversation, 
this girl used lower tones than the usual child of her age. The girl’s 
mother, likewise, used a limited range of low tones in her conversation; 
the mother claimed that she herself could not sing and expressed 
concern over the fact that her child also appeared to have no ability. 
In the initial tests, the child’s performance conformed to this picture: 
Her singing was limited to three very low tones. However, as the 
experiment progressed, she seemed to “find” her voice for higher tones 
and she began to improve rapidly in singing high notes. At the end of 
the training period, when tests were administered to thirty-six practiced 
and control subjects, this child earned next to the highest score. The 
marked change in this instance might, it is true, have been due to 
fortuitous factors or might possibly have been achieved at a later 
time apart from the practice given during the experiment; although 
this is recognized, a case such as this does suggest that a particular 
child’s limitations may be due in part to lack of practice rather than 
to lack of ability and that special training at an early age may help 
the child to forestall the habit of using only a limited part of his 
potential tonal range. 

Occasionally children showed signs of embarrassment during the 
course of the tests. Such signs usually grew less noticeable after a 
few tones had been sung, but, as suggested at an earlier point, the 
data do not indicate the influence that a subject’s embarrassment 
might have on his score. Although every effort was made during 
the singing tests to encourage the child to be at his ease and to perform 
at his best, it was impossible to gauge how successful this effort was. 
Moreover, it is possible that a subject who did not learn to feel com- 
pletely relaxed during the course of the three tests might be handi- 
capped more in the singing of high tones than in the singing of low 
tones. This point is suggested by the findings in observations of the 
spontaneous singing of three-year-old children: as described earlier, 
it appeared that children employed relatively high tones more fre- 
quently than low tones when singing to themselves (e.g. second space 
A, B, C, D occurred much more frequently than middle C, D, E and F). 
Moreover, the findings obtained in studies of the effect of training 
indicated that the gains made by children during training included 
proportionately more high than low tones. Although both of these 
observations are suggestive rather than conclusive, they do imply 
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first, that the scores representing the tentative norms reproduced in 
Table I might have been larger at each age level if records could have 
been obtained under completely spontaneous conditions; second, that 
proportionately more high than low tones might be added to the series 
representing the tones sung by fifty per cent of the children at each 


age level (Table ITI) ; and third, that a part of the marked improvement _ 
exhibited by the subjects who received training was due to increased ™ 


freedom and relaxation on the part of the children, through the effect 
of frequent contact with the experimenter and the singing situation, 
rather than due exclusively to practice in the singing of particular 
tones. 

The data do not provide adequate proof for any of these state- 
ments, but they still must be recognized as constituting an implied 
criticism of the present results and should be taken into account as 
reservations in interpretations of the data. As against these reserva- 
tions the claim might be made that the conditions under which the 
children were tested were perhaps no more formal or embarrassing 
than the conditions under which the child must sing when he receives 
formal musical instruction at school. 

During the course of the training administered in the study involv- 
ing twenty-three children aged 3 to 844 years, marked changes were 
noted in the behavior of children who appeared to be shy and inhibited 
when the project was begun. Frequent contact with singing and the 
example of other children appeared to be helpful. Moreover, it 
appeared that group singing, which gave a child an opportunity to 


sing without being conspicuous, was helpful in aiding the child to make ° 


full use of his voice in the presence of others, and this, in time, seemed 
to aid him in making the transition to singing alone. 

Observations made during the training of children are the basis 
for the following summary. At the beginning of training, when songs 
are first introduced, it seems best to select songs that are well within 
the child’s tonal range. A test of his ability to reproduce tones is 
helpful in discovering what this range is. When a child seems unable 
to sing, this may mean that the song that is assigned is not suited 
to his voice and should be transposed to a higher or lower key. Ifa 
child has only a limited tonal range, chromatic intervals should be 
introduced as a means of providing greater variety and as a means 
of avoiding too much monotony in the materials he is asked to sing. 
Many children who at first seem either inhibited or incompetent 
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respond favorably to opportunities to sing in unison with others and 
to the example of hearing another competent child perform. After 
the child’s cooperation has been won, the songs introduced for the 
purpose of training should include not only tones that are within 
the child’s range but also tones which he hitherto has been unable to 
reproduce; the former should, however, be more numerous and should 
serve as the foundation of the tune. It is helpful also to drill the 
child in the singing of particular tones. In the case of preschool 
children, it may be helpful at times to utilize songs dealing with and 
accompanied by activity (such as putting the baby to sleep, imitating 
animals, etc.). Monosyllables rather than polysyllabic words are 
recommended for the songs first presented to very young children so 
that the emphasis may be on the tune rather than the mastery of words. 
In the case of children who are able to sing a wide range of tones, 
songs incorporating a wide range of tones should be used to encourage 
them to make full use of their abilities. On the same ground, it is 
also recommended that a variety of wide as well as narrow intervals 
be included in songs designed for children who are capable singers. 


GENERAL SUMMARY 


This study includes results obtained from tests of vocal reproduc- 
tion of pitch administered to four hundred seven children, aged two to 
ten years and to sixty-five adults; findings obtained in a study of the 
effect of training in vocal reproduction of pitch administered to twenty- 
three children, aged 3 to 814 years; a brief review of findings obtained 
in an earlier study of the effects of training at the preschool level; 
some findings with respect to the reproduction of tonal intervals; 
and some practical observations! regarding the vocal education of 
children. 

In administering the tests of vocal reproduction of. pitch, the 
experimenter sounded and sang individual tones and asked the subject 
to sing them. The aim was to find how many tones the subject could 
sing and to find what these tones were. The subject’s score was the 
number of tones correctly reproduced. The scoring was based upon 
the judgments of the experimenter. As a test of the reliability of this 
method of scoring, comparisons were made between the experimenter’s 
judgments and the judgments of six other musically trained individuals 
(two of whom possessed what has been known as absolute pitch) who 





1 These are summarized on the immediately preceding pages. 
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were introduced at various times during the course of the tests. The 
experimenter and these individuals, working simultaneously but 
independently, recorded their judgments of the performance of children 
who were being tested. The average agreement between the experi- 
menter and the independent judges, based upon item by item com- 
parisons between judgments on specific tones, was 92 per cent. 

Each subject was tested three times, on as many different days 
(except in the case of twenty children who received two of their three 
tests on a single day). The correlations between children’s scores on 
the second and third tests, calculated separately at each yearly age 
level, ranged from .76 + .05 to .97 + .01, with an average coefficient of 
87. 

The average and median number of tones sung by boys, girls 
and by both sexes combined at half-yearly and yearly age levels are 
shown in Table I. In each group the range of the scores is large. 

There is evidence that children achieve the ability to sing a wide 
range of tones, as compared with adults, at a relatively early age. 
The median number of tones sung by children at each respective age 
level from two to ten years follows: 4, 6, 9, 9, 14, 13.5, 15, 16 and 16. 
The median adult score was 20. As early as the age of four years, 
individual children may be able to reproduce as many tones as the 
average adult, although the child at this age may not be as capable as 
the adult in singing tones in series or in utilizing them in songs. 

Girls sang a larger number of tones than did the boys at several 
age levels, but the difference in each instance fell short of statistical 
reliability. 

A summary of the particular tones sung by fifty per cent or more 
of the children at each age level and by adults is presented in Table IIT. 

The evidence in this study indicates that young children are able to 
sing tones lower in pitch than the tones that have been suggested as 
appropriate in manuals dealing with the musical education of children. 

The chromatic or half-step interval (minor second) was not found 
to be significantly more difficult than diatonic intervals in tests of 
twenty-three children aged 3 to 8% years. This finding agrees with 
results obtained in an earlier study of forty-seven children aged thirty- 
one to forty-eight months. Moreover the children also successfully 
incorporated semitones into their songs. Tests of the twenty-three 
children (who also took part in a study of the effects of training) 
indicated that narrow intervals (seconds and thirds) were sung 
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somewhat more readily than wider intervals (perfect fourths and 
fifths, major and minor sixths, sevenths and octave), but after a period 
of training the difference had largely disappeared. 

The present findings with regard to the effects of training on singing 
are limited to a study of twenty-three children, aged 3 to 84% years, 
who received individual and group practice during thirty-eight 
twenty- to thirty-minute periods distributed over a span of six weeks. 
During practice the children made an average gain of over thirty per 
cent in the number of tones they could sing. The gain in this case 
was substantial but not as marked as the gains shown by three-year- 
old children in an earlier study in which more intensive training was 
administered over a longer period of time. The evidence shows that 
appropriate training may enable a child to add many tones to his 
repertory. 

The results in an earlier study showed that children who received 
training maintained a statistically reliable advantage over initially 
equivalent control subjects for a period of several months after the 
termination of practice. Limited data in the present study suggest 
that the benefits accomplished by special training during a brief period 
of only six weeks may lead to improvement that-is apparent after two 
years. Observations of individual children who received practice 
indicate that training at an early age may aid a child in overcoming a 
previously formed habit of utilizing only a limited portion of his tonal 
range. It is possible, also, that training at an early age may serve to 
forestall habits of disuse that might interfere with ability to profit from 
training at a later time. 

A major problem in education is the question as to what are the 
skills that can most profitably be cultivated in early childhood as 
contrasted with performances in which practice might more econom- 
ically be deferred until a later time. This issue has been sharpened by 
the findings in numerous research studies of the relative influence of 
learning and growth in the development of a child’s abilities. In an 
earlier study which dealt with this question, through experiments on 
several motor and mental performances, the writers found that singing 
was the one outstanding performance in which the trained children 
gained a significant advantage over control subjects. Further study 
is needed to tell how permanent such gains may be and to answer 
more precisely the question as to what is the most strategic time to 
begin a child’s training in singing. It is possible that a child who is 
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encouraged to sing at an early age may acquire a degree of skill that 
could not be acquired if this training were deferred until later years. 
Although the need for further study is recognized, the findings in the 
present investigation strongly suggest that singing is one performance 
that might well be selected for emphasis in the education of young 
children. 
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THE SCORING OF ALTERNATIVE RESPONSES WITH 
REFERENCE TO SOME CRITERION 


TRUMAN L. KELLEY 
Graduate School of Education, Harvard University 


Tests of the Karl M. Cowdery and the E. K. Strong type consist 
of a number of statements with reference to which the subject expresses 
like, indifference, or dislike by marking one of three options. In the 
development of the scoring scheme for such a test, the test is given to, 
say, successful engineers and to a random lot of others. Where a 
response tendency of engineers is found to be different from that of 
others, that response is credited as indicative of the type of interest 
that engineers possess. The question here considered is that of the 
specific numerical credit that should be attached to a given response 
consequent to the number of times it is marked by engineers and the 
number of times by others, or non-engineers. 

In a random population there may be forty successful engineers in a 
thousand, and a four-fold table giving the results for an item might be 
as follows: 


TaBLE I 





Item marked /| Item not marked 








ia da wadndueekdhae eth 28 12 40 
a ae ee 500 460 960 
528 472 1000 














The question has been raised as to whether a determination of the 
weight that should be credited to marking the item as evidence of 
engineering interest should be determined from this table or one modi- 
fied to give less importance to the responses from the ‘‘others.” It is 
argued that the question asked is how do engineers as a group differ 
in their attitude toward this item from non-engineers as a group. 
It is one group against another group, and the sizes of the two groups 
are intentionally ruled out of the picture. In this case the groups 
should be equalized for size yielding the accompanying four-fold (see 
page 505). Here the results from the small number of engineers have 
been given the same importance as those from the large number of 
504 
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others. For the purpose of studying the contrast between engineering 
interests and other interests, this would seem to be reasonable. 


Tas_e II 





Item marked | Item not marked 








Te ery ae re me .350 .150 .500 
Eck na) 44.0:h nego netnesekes . 260 . 240 . 500 
.610 .390 1.000 














However, if the purpose is to take an individual drawn from the 
general population and then note how far he diverges from it in the 
direction of the interests of engineers, it would seem that the original 
or non-equalized table should constitute the basic data. 

The writer has developed a scheme for scoring items based upon 
data of the equalized table (more accurately semi-equalized because 
equalized with reference to one variable only). According to this 


development the weight to be attached to a response is proportionate 
to 


o 

(1 om $?)o (1) 
where ¢ is the product-moment coefficient of correlation from the 
semi-equalized four-fold table and oc is the standard deviation of the 
item variable. (If marking the item is credited 1, and not marking it 
credited 0, then this standard deviation equals +/pq where p is the 
proportion for the table entire marking the item, and q the proportion 
not marking it.) The factor (1 — ¢?) is proportional to the square of 
the standard error of ¢.! It is appropriate to weight a factor inversely 
as the square of its standard error when it is combined with others of 
different reliability to obtain the most reliable average or sum.? This 
scoring formula has been recommended by the writer to Drs. Cowdery, 
Strong, and others. He now wishes to withdraw it for the reason that 
instead of (1 — ¢*) which is proportional to the square of the standard 
error of ¢, there should have been a quantity proportional to the square 
of the standard error of (¢/c). The present article provides such a 


formula (eleven following), and recommends its use when semi- 


equalized tables are taken as the tables of basic data. 





1 See Kelley, Statistical Method, formula [217]. 
2 See Kelley, Statistical Method, formula [309]. 
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It, however, seems that for guidance purposes the more important 
question to ask and answer has to do with the deviation of the one 
examined from the general population in the direction of the type of 
interest of those in the vocation being considered, and that therefore 
tables not equalized should constitute the basic data. 

Let us start with such a table and derive the resulting weight for 
any given response. Since the engineers are not graded into less and 
more successful, nor the non-engineers into greater and less failures as 
engineers, this variable must be considered to be a point distribution. 
A consequence of this is that the resulting score on a test composed of 
many such items will tend to inform the subject whether his interest is 
like that of the average successful engineer and nothing else, that is, a 
high score on the engineering interest scoring scheme means interest 
like this average successful engineer and not interest like the superior 
engineer, and the higher the score the more like this mean engineer. 

We thus have a point distribution in the vocational variable. We 
also have such a distribution in the test item variable, for the item is 
either marked or not marked. Of course, those who assert a preference 
by marking the item do not all have the same attachment to it, but as 
there is no possible means by looking at the mark of knowing whether 
the preference is strong or weak, al! who mark the item must be attri- 
buted with some single interest or attitude. Accordingly this variable 
which is the independent variable, by means of which the vocational 
variable is to be estimated, must be treated as a point distribution. 
Thus, to measure relationships here, tetrachoric r or any other device 
assuming continuity in the item variable is out of place, and assumption 
of continuity in the vocation variable will not change the regression 
of it upon the item variable, so again nothing is to be gained by treating 
the vocation variable as continuous. 

Let the item variable be considered the first variable and the voca- 
tion the second variable. Then the problem is to estimate the second 
from a knowledge of the first. The weight that attaches to the first in 
doing this becomes the credit which is to be attached to the response 
in question. A similar and independent determination is made for 
each other item. The reader will immediately note that the inter- 
correlation of items has been neglected. This is admittedly a serious 
neglect dictated by circumstances, for an adequate study of the inter- 
‘correlations of all of the responses to all the items, and the utilization 
of these in determining optimum weights for each of the responses will 
ordinarily involve so much labor that it cannot be undertaken. 
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We will start with the non-equalized four-fold and express the 
frequencies in cells and in the margin as proportions of the total. To 
further simplify we will express the frequency in the upper left cell as 
equal to the chance frequency, or that in case there is no correlation 
between vocation and tendency to give the response, plus A. Thus A 
is the cell divergence expressed as a proportion. It may readily be 
shown that the same cell divergence exists for the other cells with signs 
as given in the accompanying four-fold: 














TaB.eE III 
Variable 2; 
Item marked | Item not marked| ff; Z2 
Pend ccccssusscccns pP +A gP —A P Q 
pQ-A qQ+A Q a 
is oi ne 4 oanabsncdcn tein Pp q 
Bie ccccccccccsccceccs q ——_ 

















The basic constants desired are the regression coefficient, be:, and 
its variance, o%,,, yielding w or be:/c%,,, which is the weight to be 
attached to the response. Vocation is called the zx. variable and 
presence in the vocation is called 1, while absence from it is called 0. 
Variable xz; is credited 1 if the response is marked, and 0 if not marked. 
For the z; variable we have the distribution of Table IV and resulting 
constants as computed. 






































TaBLe IV 
Raw Deviation from 
score mean score 
Xi hi LX. 1 Siti | fits? fiz? Sixt 
1 p Pp q Pq | pq? pq? pq* 
0 q 0 —?p —pPq | pq —p*q pg 
1.0 Pp 0 pq | pag — p) | pg(l — 3pq) 
M -t=p =P, =P =Po 
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Using a customary notation we define certain moments as follows: 


rx? >z;* 272? 22122 221° 
P2 = —_ » Paso = WN > Por = Ww Pu = Wy? Pst = N ; 


hoa 221°22? 
Pa = —— (2) 





These are needed in our problem and we obtain them by straight- 
forward calculation from Table III. 


Poo = pq 
Poo = pg(l — 3pq) 
Pu=A (3) 


Pa = Apg(1 — 3pq) 
Poe = pgPQ + A(q — p)(Q — P) 


The regression coefficient be: = - . (4) 
20 


We will now calculate o%,,,. Taking logarithmic differentials we have: 


dbo co dpi e dpe 
bar Pi P20 (5) 





Squaring, summing, and dividing by the number of samples, 


2 2 2 
O'ba o'pu Op. 26 p10 peel pir9 20 


b721 ge p11 + P20 “s PiiP20 (6) 








Evaluating the various terms in the right hand member by formulas 
derived by Pearson,! we obtain 


No*,,, = Pox — p'u = pgPQ + A(q — p)(Q—P)—A* (7) 











No*y, = Pso — D720 = pg(q — p)? (8) 
NO pF pu" Py = Psi — P1P2 = A(q — p)? (9) 
A 
ba _— N 
— a, PQ f + oi _ Zt (10) 
wherein i = 4— A and j = : wae 





1 See Kelley, Statistical Methods, formulas [105] and [106]. 
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The quantities 7 and 7 may be tabled. A short table is given herewith. 














TaBLE V 
1 «=> ¥ 1 — 3pq 

, Pq a ee a 
.01 101.0 98.99 9900. 
.02 51.02 48.98 2450. 
.04 26.04 23 .96 600.0 
.06 17.73 15.60 261.2 
.08 13.59 10.21 128.6 
.10 11.11 8.889 90.12 
.15 7.843 5.490 37.99 
.20 6.250 3.750 20.31 
.25 5.333 2.667 12.44 
.30 4.762 1.905 8.390 
.35 4.396 1.319 6.135 
.40 4.167 . 8333 4.861 
.45 4.040 .4040 4.204 
.50 4.000 .0000 4.000 








In the case of the semi-equalized table Q = P and the weight 
becomes 


W = wf 4 (11) 


In formula (10) the factor N/PQ, and in formula (11) the factor 4N 
may be omitted as they are constants. In the special case when P = Q 
and p = q, (10) becomes (1), except for a constant factor. For a few 
sample cases when P = .1127, we have the following parallel weights 
by (1) and (10). The weights by (10) have been divided by N/PQand 
those by (1) by a constant so as to reduce them to terms comparable 
to formula (10) weights when P = Q = p = q = .5. 

The discrepancies of the weights as given by formula (1) may be 
summarized in the statement that small A’s (synonymous with ‘“‘very 
likely to occur as a matter of chance”) are overweighted and large 
A’s underweighted. Formula (1) does seem to yield results sufficiently 
different from those given by (10) to warrant abandoning it entirely 
and using (10) in the case of non-equalized tables, and using formula 
(11) in the case of semi-equalized tables. 
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TaBLe VI 
Maximum and Weight by (1) 
Pp A minimum pos- | Weight by (10) .0254 
sible values of A PQpq — A? 
.06 .006 .0532 .004 .027 
.06 — .006 — .0068 — .033 — .027 
.10 .O1 .0887 .006 .028 
.10 —.01 — .0113 — .045 — .028 
.20 .02 .0905 .013 .032 
.20 — .02 — .0225 — .059 — .032 
.30 .03 .0692 .022 .037 
.30 — .03 — .0340 — .062 — .037 
.50 .05 .0567 .056 .056 
.50 — .05 — .0567 — .056 — .056 
-70 .03 .0342 .062 .037 
.70 — .03 — .0790 — .022 — .037 
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THE INFLUENCE OF TRUE-FALSE ITEMS ON 
SPECIFIC LEARNING 


NOEL KEYS 


University of California 


The most frequently voiced criticism of objective examinations as 
aids to learning is that they provide no occasion for students to organ- 
ize and express their ideas. As offsetting this, it has been argued that 
the encounter with objective test items serves to acquaint the student 
with much specific and significant information which would otherwise 
escape his attention. 

Experimental literature in support of the latter contention is, how- 
ever, both meager and conflicting. Hertzberg, Heilman, and Leuen- 
berger* have reported two equivalent group experiments with classes 
in educational psychology. In each, the experimental section provided 
with extensive practice upon objective tests as study aids showed a 
statistically significant superiority over the control, amounting to 
twelve or fifteen per cent upon examinations over the same content. 
When, however, a final examination was administered to both sections 
unpreceded by review on practice test materials, the experimental 
group scored no higher than the controls. 

Jersild‘* had previously discovered that two university groups given 
true-false examinations as pre-tests before reading and lectures on 
certain phases of psychology showed lower scores on the same examina- 
tions after completion of their study than did equivalent groups with- 
out such pre-tests. He concluded that the true-false test is of dubious 
value as an aid to learning, due to its false-suggestion effects and the 
fact that statements in declarative form are less stimulating to mental 
activity than direct questions, or even multiple-choice items. 

On the other hand, Lee and Symonds‘ quote Kitch as finding, with 
high school classes in biology, fairly reliable differences in favor of an 
experimental group given objective test materials as learning aids. 


NATURE OF PRESENT STUDY 


The present investigation endeavors by a somewhat different 
technique to throw light upon the extent to which the taking of 
examinations in true-false form may give rise to specific learning over 
and above that to be expected otherwise. The outcomes here reported 


constitute part of a larger experiment conducted by the writer at the 
511 
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University of California with a beginning class in educational psychol- 
ogy. This class of upper division and graduate students met for three 
periods a week throughout the spring semester of 1934. Disregarding 
students eliminated because of absences or difficulties in matching, 
there remained one hundred forty-three in each section, or a total of 
two hundred eighty-six class members, constituting the subjects of the 
present study. 

Examinations in the course were based upon two textbooks in the 
hands of the students and upon lectures to which the class periods were 
devoted. At the opening of semester a comprehensive true-false test 
covering the entire scope of the course was administered to both 
sections. It was explained that this was designed as a survey to 
throw light upon the extent to which students were already familiar 
with the ground to be covered, and so make possible the avoidance of 
unnecessary repetition. Students were informed of their scores on this 
pre-test but not permitted to inspect their papers. 

The items composing the pre-test had been validated by eliminating 
those which failed to show a higher percentage of errors on the part of 
the poorest third as contrasted with the best third of students in a class 
at Cornell University the previous summer. Pains were taken to see 
that items selected were of sufficient difficulty to insure adequate 
“top,” or room for improvement during the course. A total of one 
hundred eighteen items were chosen, dealing with subject-matter to be 
covered in the first ten weeks of semester. Of these one hundred 
eighteen, a sampling of fifty-three was drawn at random for inclusion 
in mid-term examinations. Since the possible scores on these mid- 
terms totalled three hundred fifty-nine points, the fifty-three items 
repeated from the pre-test constituted less than fifteen per cent of the 
whole. For simplicity, the fifty-three pre-test items which occurred 
in one or another of the mid-term tests also will hereafter be referred to 
as A items, and the sixty-five not so included, as B items. 

One section of the class took their mid-terms in eight brief weekly 
tests. The other was given the same items in the form of two long 
examinations in the fifth and tenth weeks of semester. The somewhat 
superior showing of the weekly-test group has been reported elsewhere. * 
The present article is concerned with comparing the improvement 





* Keys, Noel: ‘‘The Influence on Learning and Retention of Weekly as Opposed 
to Monthly Tests.” Journal of Educational Psychology, Vol. XXV, 1934, pp. 
427-436. 
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made on the A and B items, and for this purpose the two class sections 
can be treated as one. 

Five weeks after the giving of the last of the mid-term tests men- 
tioned, the pre-test was re-administered to the entire class, without 
warning of any sort. Since the students were taken entirely by 
surprise and without opportunity for special review or ‘‘ cramming,” 
the results may be regarded as an unusually fair measure of the infor- 
mation retained from subject matter covered from five to fifteen weeks 
before. 


SUPERIOR RETENTION OF ITEMS OCCURRING IN MID-TERM TESTS 


Table I shows the improvement in performance on the A as com- 
pared with the B items.* Scoring was on the basis of number of right 
minus the number of wrong answers. Since the number of statements 
in the several categories varies widely, however, all scores in this and 
subsequent tables are expressed in terms of per cent correct, that is, 
the per cent which the net score (rights minus wrongs) is of the possible 
score, or total number of items under that head. The first line of 
Table I should be read: Of the fifty-three items which were included in 
the mid-semester examinations the class scored 17.2 per cent correct on 
the pre-test and 65.1 per cent on the end-test. The difference, or gain 





* It is, of course, apparent that the influence of mid-term tests upon learning 
and retention will vary widely with the degree to which such tests are utilized 
for teaching purposes. Corrected papers are often returned to students and 
“gone over”’ in class or quiz sections. At the other extreme would be an experi- 
mental procedure which rigorously excludes students from all knowledge even of 
their scores, and refuses to disclose the correct answer to any test item. The 
present experiment followed a middle course. The endeavor was to determine the 
influence upon learning of tests when these were not deliberately utilized as teaching 
aids. Test papers were, therefore, not returned to students, and there were no 
quiz sections of any sort throughout the semester. On the other hand, effort 
was made to maintain a “natural”? atmosphere and prevent the suspicion that 
an experiment was in process. The instructor, accordingly, did not refuse to 
state the answer to specific test items when students asked for these from memory 
in lecture periods following a test. Less than ten per cent of the total number of 
test items, however, were answered in this way. Similarly, students who actively 
sought permission to see their corrected mid-term tests were allowed to do so, 
but the conditions were intentionally made inconvenient, so that, on the average 
test, but one in ten availed himself of this privilege. It should be remembered 
that the class had no thought of encountering the same items again. Moreover, the 
fifty-three items in the present study were in no way distinguished from the other 
three hundred six included in mid-term tests. 
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TaBLeE I.—GalIn In Scores oF Two HunprEp E1Guty-srx STUDENTS ON TRUE- 


FALSE ITeEMs OccURRING IN Mip-TERM Tests COMPARED WITH GAIN ON 
Irems Not So Occurrine, aS MEASURED BY ScoRES ON A TEST GIVEN 
NEAR THE OPENING OF SEMESTER AND REPEATED WITHOUT WARNING 

Five Weeks AFTER THE Last Mip-TERM TEST 


Scores Shown in This and All Subsequent Tables Are in Terms of Per Cent Correct 


When Marked on the Basis of Right Answers Minus Wrongs 





Difference, or 























weerer er Per cent | Per cent gain Ratio 
" correct | correct 
Class of item of such on poo- |'on end 
items ~ | Mean _ diff. 
test test diff. PEaitt PE airs. 
(A) Items occurring in mid- 
ING bic oid Kc 53 $7.3 65.1 47.9 + 0.66 73 
(B) Items not occurring in 
mid-term tests........ 65 14.0 42.6 | 28.6 + 0.60 48 
Total items............. 118 15.4 52.7 | 37.3 + 0.56 67 
Difference (A — B)................ 3.2 * 22.5 19.3 
Probable error of difference......... 0.81 0.89 
* : difference 
Critical ratio = PE. 4.0 21.7 








TasBLe II].—GaIn 1n Scores oF Two HunprRED E1Guty-six STUDENTS ON TRUE- 


FALSE ITEMS EMBODYING CERTAIN PoPpuULAR FALLAcIES OCCURRING IN 


Mip-TeERM TrEstTs AS COMPARED WITH GAIN ON Sim1Lar Items Nor So 

















INCLUDED 
Diff : : 
Per cent | Per cent ot Ratio 
Number orrect | correct et 
Class of item of such me on Ke ok 
items ‘ ~ | Mean diff. 
test | test | aie, | PE! Bitoe 
(a) Fallacy items occurring 
in mid-term tests...... 13 24.6 76.9 §2.3+1.1 48 
(b) Fallacy items not occur- 
ring in mid-term tests. . 12 10.0 62.5 62.56 +1.2 44 
Total selected a 
items. . : 25 17.6 70.0 §2.4+1.1 48 
Difference (a — b).................| 14.6 14.4 —0.2 
Probable error of difference......... 1.72 1.63 
te ‘ difference 
Critical ratio = —.” 5 —0.12 
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of 47.9, is seventy-three times the probable error of that difference.* 
On the other hand, the gain on the B items amounts to but forty- 
eight PE. 

Of more significance are the comparisons in the lower half of the 
table. As shown in the second column, the difference between the 
mean pre-test score of 17.2 per cent on the A items and 14.0 per cent on 
the B items is 3.2. This amounts to four times the PE of that differ- 
ence. Since a difference of four PE or more is considered to be statis- 
tically reliable, we may be reasonably certain that for some reason the 
pre-test items which occurred also in the mid-term tests were somewhat 
less difficult originally than those which did not so occur. However, 
the fourth column shows that the difference between the gains on 
A and on B items amounts to 21.7 times its PE, or more than five 
times that observed on the pre-test scores. 

A simpler comparison would be to note that, whereas the proportion 
of A items correct on the pre-test was twenty-three per cent greater than 
that of the Bitems (17.2/14.0 = 1.23), the proportional gains on A items 
was sixty-seven per cent greater than that on the B’s (47.9/28.6 = 
1.67). Since the ratio of 1.67 to 1.23 is 1.36, the improvement shown 
on the items which occur in mid-term examinations is approximately 
thirty-six per cent greater than that which might be expected from 
difference in the initial difficulty of the two sets of statements. 


PERFORMANCE ON ITEMS EMBODYING CERTAIN POPULAR FALLACIES 


Critics of the true-false examination make much of the alleged 
harmful effect of false items through the fixing of erroneous ideas in the 
minds of students. It will be recalled that Jersild‘* held this factor 
largely responsible for the poor showing of the group given a true-false 
test before study. The investigations of Ballard,! Remmers and Rem- 
mers,® Sproule,*? and Roberts and Ruch,’ ranging from fifth grade to 
college classes, have found the net influence of true-false tests to be 
favorable in all cases, although the last named discovered negative 





* The probable errors of all differences in this study representing gains or 
losses as between initial and final tests on the same individuals have been com- 
puted by the formula 


.6745¢(4-B) 

VN 
See Ezekiel, Mordecai: ‘‘Reply to Dr. Lindquist’s Further Note on Matched 
Groups.” Journal of Educational Psychology, Vol. XXIV, 1933, pp. 306-309. 
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suggestion in approximately one answer out of seven. A secondary 
phase of the present study may serve to cast fresh light upon this issue. 

In the compilation of items for the pre-test described, the writer 
took pains to include twenty-five statements, each of which expresses a 
widely held but scientifically discredited idea. The twenty-five 
statements involved may be thought of as popular fallacies which the 
ip | present course should tend to correct. Later inspection of the pre- 

a4 test disclosed other statements which might fairly have been placed in 
ha the same category, but the applicability of the description to the 
a twenty-five, at least, seems clear. The fallacies selected were inter- 
a} spersed with the ninety-three other items making up the pre-test of 
| one hundred eighteen, and were divided like the rest. Thirteen fell 
among the items to be included in mid-term tests and twelve among 
those not so given. 

The following will serve to illustrate their general character. 
Under each head, the first two items cited are the two which proved to 
be most difficult on the basis of end-test scores, the third and fourth are 
items of median difficulty, and the fifth is the easiest of the group named. 





Items Embodying Popular Fallacies Included in Mid-term Tests 


; 1. Up to the age of six or eight, children’s learning depends almost wholly 
; upon imitation; after that they learn by reason. 
f 2. Ambitious young people often suffer nervous breakdowns from over-study. 
; 3. Persons born blind have keener senses of touch and hearing than those with 
normal vision. 
\ 4. Most differences between boys and girls as regards play interests and activi- 
ties are due to differences in inborn tendencies of the two sexes. 
: | 5. One can estimate an individual’s intelligence fairly accurately from a good 
| photograph. 


Items Embodying Popular Fallacies Not Included in Mid-term Tests 


1. One’s capacity for learning most things is greater under fifteen years of 

age than over. 
P| 2. The discomfort one feels in a ‘“‘stuffy”’ room full of people is due to the 

excess of carbon dioxide in the air. 

3. A competent personnel worker can discover more about an applicant in a 
ten-minute interview than he could learn from many hours of testing. 

4. Feeble-minded children can often be rendered normal by the removal of 
diseased tonsils or adenoids. 

5. A number of eminent men were morons in childhood. 


Performance on the special group of twenty-five fallacies may be 
seen in Table II. From the final column it will be seen that the differ- 
ence in gains on a as compared with b items is comparatively small. 
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This is brought out more clearly at the bottom of the table. Whereas 
the a items appear to be less than half as difficult as the b items on the 
occasion of the pre-test, and this difference has the high statistical 
significance of 8.5 PE, improvement during the semester is no greater 
on the a than on the b items, which do not occur in mid-term tests at 
all. The contrast between Tables I and II in this respect is striking. 
And, since the items selected for special study in Table II are a part of 
the total one hundred eighteen in Table I, it will be evident that the 
superior showing made in the items occurring in mid-term tests would 
have been even more marked in Table I, had the twenty-five fallacies 
mentioned been excluded from the A and B lists there compared. 


NEGATIVE SUGGESTION EFFECTS FROM FALLACIES 


In order to compare directly the influence of occurrence in mid-term 
tests in the case of these popular fallacies as contrasted with other 
items, the writer has assembled in Table III figures for the twenty-five 
fallacy items in juxtaposition with the data for the ninety-three items 
which are left of the one hundred eighteen after eliminating these 
twenty-five. To distinguish these ‘““Other Items” from the total 
groups of Table I, the designations A’ and B’ have been used. The 
first four columns of Table ITI are analogous to those of previous tables. 


Taste III.—INFLUENCE oF OCCURRENCE OR NON-OCCURRENCE IN MID-TERM 
Trests UPON INCREASE IN ScorEs ON ITEMS EmMBopyING CERTAIN POPULAR 
FALLACIES AS CONTRASTED WITH ITEMS OF OTHER TYPES 








Per cent | Per cent ‘ Ratio of 
Differ- : 
. Number} correct | correct gain to 
Class of item : ence, or 
of items | on pre- | on end- aie pre-test 
test test B score 
Popular fallacies................ 25 
(a) Occurring in mid-term tests} 13 24.6 76.9 52.3 2.13 
(b) Not occurring in mid-term 
Sn by kee} 4ad 6% 12 10.0 62.5 52.5 5.25 
a Vek es ee aie 93 
(A’) Occurring in mid-term 
BS CI Ua die eee 40 14.8 61.3 46.5 3.14 
(B’) Not occurring in mid- 
i aa asa 53 14.9 38.1 23.2 1.56 























The significant figures are those of the final column. 


test score on each group of items has been used as a convenient index 
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of their initial difficulty, and the gain shown during the semester 
expressed in terms of this difficulty. For example, the figure 2.13 for 
the a group signifies that the gain in scores on these items amounts to 
2.13 times the pre-test score. Any such index is, of course, open to 
criticism. Its use, however, serves to point the difference in the 
behavior of these fallacies as contrasted with other test items. Thus, it 
will be noted that in the case of the fallacies, this ratio is less than half 
as great for the a items, which have not occurred in mid-term tests, 
as for the b items, which have. With the ‘other items” the opposite 
is the case; that is, the A’ items show a relative gain which is twice that 
for the B’. 

While the number of items under the several heads is not large, the 
size of the experimental group gives these differences indisputable 
statistical significance. The conclusion seems inescapable that under 
conditions of the present experiment, the occurrence of true-false items 
in mid-term tests makes for increased learning, as a rule. In the case 
of statements expressing certain common misconceptions, however, 
occurrence in mid-term tests seems to reduce the improvement which 
might otherwise be anticipated. There is, then, strong indication of 
negative or false suggestion effects, from encounter with statements 
of the latter type. 


TRUE-FALSE STATEMENTS AND THE SCIENTIFIC ATTITUDE 


Despite the influence of the above factor, it should be noted that 
improvement in scores on the fallacy items as a whole compares very 
favorably with that on ‘‘other” types. False suggestion or no, it 
seems clear that much learning has taken place on the points involved. 
In the writer’s opinion, the cultivation of a healthy skepticism toward 
widely held ideas unsupported by scientific evidence should be one of 
the most valuable outcomes of university courses. 'To those who share 
this view, an occasional residuum of erroneous ideas due to plausible 
misstatements in true-false tests may seem none too high a price to pay 
for a critical habit of mind. 

That such an attitude was fostered by the course under considera- 
tion is suggested by a third set of data from the present experiment. 
In constructing the pre-test, the writer deliberately inserted a small 
group of false statements having some bearing on psychology, but with 
no place in the present course; e.g., ‘“‘Silent men are usually deep think- 
ers’? and ‘Children from the marriage of first cousins are almost 
invariably defective.”” These statements were additional to the one 
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hundred eighteen items summarized in Table I, they occurred in none 
of the mid-term tests, nor was the information involved touched upon 
at any point in the textbooks or lectures of the course. It should be 
added that no outside reading was assigned or suggested in this course. 
Yet Table IV shows that scores on these items made an appreciable and 
highly significant gain, amounting to nearly fourteen times the prob- 
able error. The improvement noted might be due to learning from 
other courses carried concurrently; but this is unlikely, with two hun- 
dred eighty-six students representing highly diversified programs. 
The most probable other explanation would seem to be the growth of a 
somewhat generalized habit of challenging popular ideas without 
known scientific basis. More extended experimentation on the trans- 
fer of such critical attitudes is highly desirable. 


TaBLE IV.—IMPROVEMENT IN SCORES ON TRUE-FALSE STATEMENTS EMBODYING 
FALLACIES OF PsYCHOLOGICAL BEARING BUT NOWHERE TOUCHED UPON IN 
THE READING, LECTURES, OR MID-TERM TESTS OF THE PRESENT COURSE 











| , 
Diff . : 
Per cent | Per cent eS 
Number gain 
correct | correct . 
of such | on pre- | on end- Bn 
items hee test Mean PEaut PEgitt. 
diff. 
Items not touched upon 
IR co sk ne 6d wia 5 46.0 68.0 22.0 + 1.6 13.8 




















SUMMARY AND CONCLUSIONS 


A re-test over the first ten weeks of a course in educational psychol- 
ogy was given to two hundred eighty-six students without warning, 
five weeks after completion of that section of the subject. The results 
showed: 

1. That improvement in scores on true-false statements occurring 
in mid-term tests was sixty-seven per cent greater than on items not 
so included, or thirty-six per cent greater after allowing for apparent 
differences in difficulty of the two sets of items. 

2. That the opposite tendency was observable with true-false 
statements expressing certain popular fallacies in this field, indicating 
the presence of negative, or false suggestion, effects from such items. 

3. That despite such negative suggestion effects, improvement in 
scores on twenty-five popular fallacies as a whole exceeded that on the 
remaining true-false items. 
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4. That scores on similar fallacies nowhere touched upon in the 
course also showed a highly significant improvement, suggesting the 
development of a generally critical attitude toward unproved asser- 
tions of a psychological nature. 


REFERENCES 


1. Ballard, F. B.: The New Examiner. London: Hodder and Stoughton, 1924, 
pp. 96-98. 

2. Cocks, A. W.: The Pedagogical Value of the True-false Examination. Baltimore: 
Warwick and York, 1929. 

3. Hertzberg, O. E., J. D. Heilman, and H. W. Leuenberger: ‘‘The Value of 
Objective Tests as Teaching Devices in Educational Psychology Class.” 
Journal of Educational Psychology, Vol. XXIII, 1932, pp. 371-380. 

4. Jersild, Arthur: ‘‘ Examination as an Aid to Learning.” Journal of Educational 
Psychology, Vol. XX, 1929, pp. 602-609. 

5. Lee, J. M., and P. M. Symonds: ‘‘ New Type or Objective Tests: A Summary of 
Recent Investigations.’”’ Journal of Educational Psychology, Vol. XXV, 
1934, pp. 161-184. 

6. Remmers, H. H., and E. M. Remmers: ‘‘The Negative Suggestion Effect of 
True-False Examination Questions.”’ Journal of Educational Psychology, 
Vol. XVII, 1925, pp. 52-56. 

7. Roberts, H. M., and G. M. Ruch: “‘The Negative Suggestion Effect of True- 
False Tests.””’ Journal of Educational Research, Vol. X XI, 1928, pp. 112-116. 

8. Sproule, Chester: ‘“‘Suggestion Effects of the True-false Test.” Journal of 
Educational Psychology, Vol. X XV, 1934, pp. 281-285. 

9. Turney, A. H.: ‘‘The Effect of Frequent Short Objective Tests upon the 
Achievement of College Students in Educational Psychology.”’ School and 
Society, Vol. XXXITI, June 6, 1931, pp. 760-762. 








ee ee. ee ee a. | 








AN EXPERIMENT ON THE INFLUENCE OF 
PRELIMINARY SKIMMING ON READING 


HOWARD YALE McCLUSKY 
University of Michigan 


The Gestalt school of psychology emphasizes the value of an 
initial impression of wholeness before an analytical attack is made 
on the parts which derive their meaning from the whole. Applied 
to reading, this idea is contained partially in the suggestion that a 
normal reading of a passage of subject-matter should be preceded by 
a skimming over-view of the material. As far as the writer is aware, 
however, this suggestion has never been put to an experimental test 
in a reading situation. The only quantitative study of skimming 
that appears in the scientific literature is reported by Whipple and 
Curtis. While their investigation involved only one rapid contact 
with the reading passage and is therefore somewhat different from 
the experiment reported in this discussion, nevertheless certain of 
their conclusions are pertinent for the present study: It is valuable 
to know that skimming is about twice as rapid as normal reading, 
and that preferred rates of skimming lead to more effective results 
than forced rates. Furthermore, skimming involves a warming up 
process; it is probably influenced by familiarity with the material, and 
different individuals employ different methods. 


PURPOSE 


The purpose of this experiment may be stated as being that of 
determining the influence of a preliminary skimming on a normal 
reading of the same passage immediately following the skimming. 


SUBJECTS AND MATERIALS 


The experimental materials consisted of a six hundred eleven word 
passage from the field of sociology dealing with the subject of ‘‘ Woman 
and Marriage.”” The passage was preserved in its original continuous 
form so as to conform as nearly as possible to a normal reading situa- 
tion. It was followed by a series of twenty-one new-type objective 
questions which were employed to measure comprehension. Rate 
was computed in terms of the time required to read the passage 





1 Whipple, G. M. and Curtis, J. N.: ‘‘ Preliminary Investigation of Skimming in 
Reading.” Journal of Educational Psychology, Vol. VIII, June, 1917, pp. 33-49. 
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once at a normal speed. The subjects were sophomores at the Uni- 
versity of Michigan. The experiment was conducted in small groups 
of eighteen or less, allowing an opportunity for a careful administration 
of the procedure. On another occasion the subjects had taken the 
Otis Self-Administering Test of Mental Ability: Higher Examination, 
Form A, and the Burgess reading scale. 


PROCEDURE 


The steps of the procedure were briefly as follows: 

1. The subjects were comfortably seated at tables in the laboratory, 
each one with a supply of practice and experimental materials. 

2. They were first given practice in skimming for about fifteen 
minutes. 

3. Next, they skimmed the experimental test for twenty-five 
seconds, time being kept by the experimenter using a stop watch. 

4. Finally they read the experimental test normally and made 
appropriate responses on the comprehension test covering the passage, 


THE NATURE OF THE PRACTICE IN SKIMMING 


The writer called attention to the patterns in which reading 
materially usually appears. He made special reference to the sign 
posts, the topic and clincher sentences of paragraphs, and the meaning 
in the body of the paragraphs. He suggested that different methods 
of skimming could be based on the conscious and rapid recognition of 
these language devices. Accordingly practice was given in three 
types of skimming: The first consisted of glancing at the first (topic) 
and last (clincher or summary) sentences of each paragraph; the 
second involved running the eye down the middle of the page; and 
the third involved random selection of sign posts and the hearts of 
sentences. 

At the end of each skimming exercise, the experimenter asked the 
group some questions covering the material to make certain that 
attention was being directed to the meaning as well as the process. 
The skimming was practiced in short periods of twenty-five seconds 
in order to compel a rapid reaction and to train the subject to judge 
the amount of space that could be covered in the time allowed for the 
preliminary skimming of the experimental test. 

The matched-pair technique of experimentation was employed. 
Each member of the experimental group using the preliminary skim- 
ming method was matched on the basis of scores on the Otis intelli- 
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gence test and the Burgess reading scale with a member of the control 
group who read the experimental passage normally without the use 
of a preliminary skimming period. The data from one hundred 
eighteen subjects forming fifty-nine ‘“‘matched pairs” are summarized 
in the averages of Table I. 


TasLe I1.—Tue INFLUENCE OF A PRELIMINARY SKIMMING FOR TWENTY-FIVE 
SECONDS ON THE RATE AND COMPREHENSION OF READING MATERIAL FROM 
SocroLocy (WOMAN AND MARRIAGE) 











No skimming Skimming 
Intelli- Read- Rate Compre- Intelli- Read- Rate Compre- 
gence ing hension gence ing hension 
Average......... 57 .86 13.72 2.54 16.37 57.95 13.52 2.12 16.18 





























In the rate scores of the above table the first digit refers to minutes and the second and third 
digits refer to seconds. For example, the average time required by the members of the control 
group to read the experimental passage was two minutes and fifty-four seconds (two hundred 
fifty-four). 


The rate and comprehension scores of the experimental subjects 
in the preceding and following tables are measures of their performance 
in the normal reading after they had skimmed the material and does 
not include the twenty-five seconds involved in the preliminary skimming 
exercise. The averages indicate that the group employing the pre- 
liminary skimming procedures read the article about seven-tenth of 
a minute or about forty-two seconds more rapidly than the other 
group, and made within one-fifth of a point as good a performance 
in comprehension. 

At least in this experiment, the evidence clearly indicates that, 
on the average, a preliminary skimming enhances the rate of normal 
reading without appreciably diminishing the comprehension. 

The original data from which Table I was constructed may be 
further analyzed. For this purpose the pairs of subjects will be 
classified into four types. The first type will consist of those pairs 
ia which the members of the preliminary skimming group made 
either the same or a higher score in both rate and comprehension 
than the members of the non-skimming group. The second type is 
composed of those members of the preliminary skimming group who 
made lower scores in both rate and comprehension than the matched 
partner of the non-skimming group. The third type is that in which 
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the preliminary skimming subject made the same or a higher rate 
score, and a lower comprehension score, and a fourth type includes 
those preliminary skimming subjects who made lower rate, and the 
same or higher comprehension scores. 

Data for the first type, in which the preliminary skimming subject 
is either the same in one and higher in the other or higher in both 
rate and comprehension, are contained in the ensuing table. 

Table II indicates clearly that in a large number of cases a short 
period of preliminary skimming appreciably enhances the compre- 
hension of the material read and quite significantly accelerates the 
rate of reading, making for greater efficiency even when the time 
for the preliminary skimming is added to the time required for the 
normal reading. 


TasLE I].—PRELIMINARY SKIMMING SuBsecTts WHOSE ScORES ARE EITHER THE 
SaME AND HIGHER OR ARE HIGHER IN Botu Rate anp COMPREHENSION 
Tuan THOSE OF THEIR MatTcHep PARTNERS 











No skimming Skimming 
Individual | [telli-| Read-) 204, | Compre-| radividual | imtelli-| Read-| 2.4, | Compre- 
gence | ing hension gence | ing hension 
A.C. 54 | 16 | 355 17 C. M. 55 | 16 | 235 17 
B. 8. 60 | 10 | 242 11 E. F. 59 9 | 200 19 
G.C. 51 8 | 325 15 B. I. 51 | 11 | 2085 15 
M. W. eo | 15 | 226 11 E. H. 61 | 15 | 158 15 
8. H. 61 9 | 300 14 B. I. 64 | 10 | 220 14 
M. R. 62 | 16 | 325 18 w. J. 62 | 17 | 150 18 
L. W. 57 | 14 | 310 13 w.L. se | 12 | 220 13 
K. M. 46 | 14 | 325 13 V.G.E. 47 | 13 | 140 17 
P.I. a8 | 11 | 415 13 H. N. R. 49 9 | 310 16 
C.E. 63 | 17 | 220 15 B. M. 62 | 17 | 150 19 
B. L. 58 | 13 | 300 17 S.C. 58 | 13 | 240 18 
8. R.C 61 | 11 | 335 15 W. M. 63 | 13 | 155 19 
8. B. s2 | 13 | 310 14 H. J. 54 | 13 | 205 18 
O. L. 73 | 16 | 245 15 H. E. 7 | 14 | 240 18 
B. H.E. 59 | 20 | 235 14 F. F. 62 | 20 | 130 17 
W. W. A. 63 | 10 | 230 ig | HLA. 63 | 13 | 120 18 
P. M. 64 | 13 | 410 19 L. D. 64 | 12 | 205 20 
D. R. 66 | 16 | 315 16 S. D. 65 | 15 | 225 17 
L.C.L 53 | 12 | 240 14 H. E. 53 | 16 | 155 17 
8. F. 51 | 15 | 200 14 Y. M. 51 | 16 | 120 17 
R. C. 64 | 16 | 310 17 S. D. 64 | 14 | 205 17 
T. A. 64 | 9 | 250 16 F. R. 67 9 | 215 16 
H. J. 50 | 11 | 250 16 K. M. 50 | 12 | 215 18 
Average....... 58.2 | 13.2 | 3.03 15 Average.....| 58.8 | 13.4 | 2.06) 17.08 
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Data for the second type, in which the preliminary skimming 
subject is lower in both rate and comprehension are contained in the 
following table. 


TasLe III.—Pre.timinary SKIMMING SuBsJects WuHose Scores ARE LOWER IN 
Both Rate AND COMPREHENSION THAN THOSE OF THEIR MATCHED 
PARTNERS 





No skimming Skimming 





Individual | intelli-| Read-) p14, | Compre} raaividual | itelli-| Read-| 2... | Compre 
gence | ing hension gence | ing hension 





8. I 52 12 210 20 L. H. 53 12 300 16 
H. R. 73 17 225 18 G. L. 3 13 230 14 
L. J. 59 10 325 17 B. A. 57 11 335 14 
F. N. D. 41 15 210 17 8. E. 40 14 225 15 
McG. D. 57 16 155 18 B. C. 56 14 235 17 
Average....... 56.4 14 2.25 18 Average. 55.8 | 12.8 | 2.49 15.2 
































The first conspicuous item revealed by a comparison of the two 
preceding tables is that the second type contains much fewer subjects 
than the first type. The averages also indicate that the difference 
in the second type between the rate scores is not as great as it is in 
the first type, while the difference between the comprehension scores 
is greater. This means then that the acceleration of the rate is some- 
what more pronounced when it is accelerated than the retardation 
when it is retarded, while the opposite is true of comprehension. 

Data for the third type, in which the preliminary skimming subject 
has either the same or a higher rate score and a lower comprehension 
score, are contained in table IV. 

The averages indicate a convincing superiority in rate for the 
skimming subject. The extent of this superiority may be seen in 
the fact that the status of only six subjects (G. F., B. H., I. W., W. G., 
S. M., P. G. E.) would be altered if the twenty-five seconds required 
for the preliminary skimming would be added to the normal reading 
time. The averages also reveal an appreciable inferiority in compre- 
hension for the skimming subject. An inspection of the individual 
cases exhibits the fact that there are fifteen subjects who show an 
inferiority in comprehension of two points or less but it also shows 
four subjects who are inferior in comprehension by five or more 
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points. These facts might be interpreted to mean that the preliminary 
skimming preceding a normal reading in many cases does not produce 
a serious loss in comprehension, while in a few it interferes seriously 
with the understanding of the material read. By far the greatest 


TasBLe 1V.—PRELIMINARY SKIMMING SussEcts WHosE Rats Scores ARE EITHER 
THE SAME OR HIGHER AND WHOSE COMPREHENSION Scores ARE LOWER 
THaN THEIR MatcHep PARTNERS 











No skimming Skimming 
Individual | TBtelli-| Read-| 564, | Compre radividual | [ntelli-| Read-| 2... | Compre- 
gence | ing hension gence | ing hension 
F. C. 56 10 335 14 C. R. 56 10 245 12 
G. A. 57 13 355 22 J. M. 57 12 225 17 
H. E. 52 10 220 16 G. F. 52 9 205 14 
H. P. 62 15 300 17 K. C. 62 15 225 16 
K. E. 51 12 300 ll G. M. 49 10 155 10 
K. G. 59 15 240 18 C. 8. H. 58 15 205 16 
McK. § 70 18 230 16 B. H. 69 18 205 15 
N. L. 65 18 330 18 C. E. 66 18 140 17 
O.N. 55 12 310 18 P.G.E 55 12 155 15 
B. H. 49 13 305 20 C. M. 49 15 135 18 
R. H. 49 12 440 20 A. A. 49 12 245 14 
G. B. 69 18 335 18 G. L. 68 18 130 17 
B. J. BR. 65 17 355 18 A. D. 65 16 220 16 
B. M. 65 12 335 17 J. 8. 65 13 225 14 
8S. W. 67 18 435 18 F. V. 66 20 155 16 
8. M. 57 a” 240 19 G. B. 57 11 150. 14 
G. M. 64 19 210 17 I. W. 65 19 200 16 
Ww. W. 53 12 200 18 w. G. 54 ll 135 15 
H.C. A 58 17 350 19 L. F. 56 15 230 18 
C. A. 43 14 230 17 G. J. 40 14 200 13 
8. J. E. 59 16 305 18 B. R. 59 14 155 16 
P. V. 61 12 235 19 L. C. 60 14 155 15 
V. T. M. 70 14 250 19 Ww. G. 70 12 145 14 
8. R. 62 12 220 19 8. M. 62 15 215 18 
D. A. 53 12 155 19 P. G. E. 55 12 155 15 
Average....... 58.8 | 14.1 | 3.05 17.8 Average.....| 58.56 14 2.02 15.2 
































variation in performance in comprehension occurs in the type of case 
reported in the above table. , 

The fourth and final type is composed of that preliminary skimming 
subject who made a lower rate score and the same or a higher compre- 
hension score than his matched partner. Data for this type are 
presented in Table V. 

The first apparent item is the fact that there are fewer cases in the 
fourth type than in the third type. The second obvious point is that 
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the disadvantage in the rate average for the preliminary skimming 
subject in this type is not as marked as the advantage for the skimming 
subject in the third type. In any case, the data of this table show 


TaBLE V.—PRELIMINARY SKIMMING SuBJEcTs WHosE Rats Scores ARE LOWER 
AND WHOSE COMPREHENSION SCORES ARE EITHER THE SAME OR HIGHER 
Tuan Tuose or Tuerr Matcuep PARTNERS 





No skimming Skimming 





Individual | [Btelli-| Read-| 24, | Compre-T sadividual | itelli-| Read-| 2... | Compre- 
gence | ing hension gence | ing hension 





H. C. E. 40 12 230 15 8. A. 41 11 355 17 
H. H. 53 14 245 10 W. M. 54 14 250 17 
F. R. 57 13 235 18 Ww. R. 57 11 240 18 
B. I. 54 15 200 10 B. R. 54 14 205 14 
B. E. H. 65 13 200 17 G. M. 63 13 245 19 
Average....... 563.8 | 13.4 | 2.22 14 Average.....| 53.8 | 12.6 | 2.51 17 
































that there were a few subjects who were retarded in rate but improved 


in comprehension as a result of a rapid skimming preliminary to the 
normal reading of the material covered. 


INTERPRETATION 


The facts of this experiment contained in the five preceding tables 
may now be assembled for purposes of interpretation. At the outset 
there are a few negative cases which deserve attention. The subject 
may have been peculiarly retarded by the nature of the content 
covered, or he may not have benefited from the skimming practice 
sufficiently to influence performance in the experiment. That is, the 
fifteen minute practice and the preliminary skimming in the experiment 
itself may have served to confuse and impede the reader instead of 
assist him. It is the judgment of the writer that the latter explana- 
tion is the more plausible of the two. Surprisingly cogent evidence 
for this viewpoint may be found in the unsolicited introspective report 
and the performance of one subject who made a much lower compre- 
hension and rate score than his matched partner (see pair S. I.-L. H. 
Table III). He wrote on the back of his reading test that the skimming 
“mixed him up’’; and that he ‘“‘needed more practice” in it before e 
would be able to apply it effectively to his reading. This is the only 
introspective report of this nature available, but since there are so 
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few cases in which both rate and comprehension were retarded, it 
doubtless is a clue as to what occurred in the remaining cases. 

The most striking fact revealed by the experiment is that pre- 
liminary skimming in a great majority of cases enhances the rate of 
normal reading. Support for this conclusion is found in the fact that 
forty-eight of the fifty-nine preliminary skimming subjects read more 
rapidly than their matched partners. Furthermore, forty of these 
forty-eight subjects read so much faster in their normal reading than their 
partners that they were superior in rate performance even when the 
twenty-five seconds of the preliminary skimming are added to the time 
for the regular reading. And finally even in the negative cases, rate 
was never as greatly retarded as it was accelerated in the cases of 
improvement. The results for comprehension are not as pronounced. 
In about an equal number of cases the skimming subject is either 
higher or lower than his matched partner. In neither case are the 
differences extreme except in some instances where an extremely 
rapid rate has a compensation in a rather low comprehension score. 
By way of a general statement it might be said then that a preliminary 
skimming accelerates the rate appreciably with no significant changes 
in comprehension, except in a few cases where a very rapid rate tends 
to reduce the comprehension of the material read. 

The tendency for an accelerated rate as a result of preliminary 
skimming is significantly pronounced. Three conditions might be 
responsible for this fact. The first one is that the practice in skimming 
added to the period of preliminary skimming may have set up a 
temporary acceleration of the motor processes in reading which had 
sufficient momentum to speed up the normal reading of the material. 
The second condition may have been the influence of an attitude of 
looking for the broader meanings and ideas instead of the details of 
the material. And the third condition may have been the meaningful 
background secured through the first skimming of the material 
enabling the subject to read more rapidly when he covered the material 
in a normal reading. 

In closing the comments on this experiment one caution should be 
presented. While this experiment was repeated three times, in small 
groups, in all instances it employed the same reading material. It 
will be remembered that this material was taken from the social 
sciences, and discusses the topic of woman and marriage about which 
most college students have some opinion. At least it is not an alien 
topic, presented in technical terms. The question naturally arises 
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then, would the same results have been secured with another type of 
subject matter. It should be said that this problem was not attempted 
and so remains as a topic for further investigation. The conclusions 
of this experiment, however, seem to the writer to warrant the state- 
ment that preliminary skimming for a majority of subjects tends to 
facilitate the rate of reading when the material is such that a pre- 
liminary skimming will give a fair estimate of the content. Further 
than that extreme caution should be exercised in the application of the 
results of this experiment. 


SUMMARY STATEMENT OF CONCLUSIONS 


1. Preliminary skimming accelerates the rate of normal reading in 
a large majority of cases. 

2. Preliminary skimming increased as much as decreased the com- 
prehension of normal reading with the qualification that a few cases of 
decreased comprehension are more striking than the cases of increased 
comprehension. 

3. In a very few cases preliminary skimming actually interferes 
with normal reading. 

4. The increase in rate is probably due to a temporary acceleration 
of the motor elements, or an attitude of reading for larger ideas as 
opposed to details, or the meaningful background that comes as the 
product of the operation of such an attitude. 

5. The conclusions of this experiment apply primarily to subject 
matter for which the reader possesses a background of opinion, 
experience and general knowledge, and should be applied with great 
caution to reading performance involving other types of material. 
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THE CULTURAL CONTENT OF GENERAL INTEREST 
MAGAZINES 


WINONA L. MORGAN AND ALICE M. LEAHY 
Institute of Child Welfare, University of Minnesota 


What magazines people read has long been a question of inquiry 
by educators and librarians. Enumerations and classifications accord- 
ing to various investigators’ theories of the desirable for certain 
age groups, for certain occupational interests, and for certain cultural 
interests have been carefully prepared. But nowhere in the literature 
is there an attempt to go beyond mere classification of the individual 
magazine in reference to a particular topic such as travel, mechanics, 
sports, politics, fiction, etc. It is true that within arbitrary categories 
note has been taken of the magazines chosen by the patrons of our 
public libraries and judgments of popularity have been determined 
by circulation figures. Whether or not the most frequently selected 
magazine is high or low in the quality of its articles is a matter of 
individual opinion. That there is a difference in quality begs for no 
defense, although this difference is less obvious than such differences, 
for example, as space given to advertising, style of type, and the 
amount of photography included. 

The purpose of the present study is to secure from competent 
persons a composite judgment of the cultural content of general 
interest magazines. The definition of “‘cultural” employed is the 
generally accepted meaning given in Funk and Wagnalls, New Stand- 
ard Dictionary, 7.e. ‘‘ pertaining to the degree of refinement of mind, 
morals, or tastes.”” Hence, our concern is the degree of refinement 
in the opinion of the judges that a magazine reflects. 

The magazines to be rated were selected on the basis of their 
general interest. All technical magazines, children’s magazines, and 
others that would be of interest largely to special groups were omitted. 
For example, the Journal of the American Medical Association was 
omitted as being of interest chiefly to a particular profession while 
Hygeia was included as being of general interest to the reading public. 
The magazines were not chosen at random, but were selected to 
represent various fields of interest according to the experience of the 
investigators. A list of the seventy-four magazines used is given in 
Table II. 

The results of the study are based on the judgments of fifty indi- 
viduals, twenty-five men and twenty-five women, all of whom rated 
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at least sixty of the seventy-four magazines. If a judge rated less 
than sixty, his ratings were not used since competency to judge was 
arbitrarily based on familiarity with at least eighty per cent of the 
magazines. Sixty-five judges were used in order to obtain fifty 
whose percentage of judgment was sufficient to meet the criterion. 
All of the judges were trained persons; twenty-six were engaged in 
academic work, fifteen in non-academic professional work, four in 
business, and five were housewives. Table I gives the special fields 
of interest represented by the judges, with the number of men and 
women in each group. Ten of the judges are listed in Who’s Who in 
America, and sixteen of them have the Ph. D. degree. In general, the 
judges were selected because they were known to be well read persons. 


TaBLeE I.—A DESCRIPTION OF THE JUDGES 











Field of special interest Number of | Number of Total 
men women 

icine crsiad tik edie vibes ken 2 1 3 
i 6b wok 4 bee h OAR daa eee 3 3 6 
EE Se aay Sree fo 4 4 
RES Ie ere Speer ema ee 1 _ 1 
es. ie on dee 0 keh oe ane 2 1 3 
EE EE AE Acme es 1 1 
RITE a s 2s an ool'dia 6A GEER eke kK ed 1 1 
el aa. os 06.40 SL ena bees Lk 4 a 4 
ccs ck ie phe a Cee eaeatw a deen ga 1 1 
eS a iia pee S wae See 1 2 3 
RE Os ee SS ee aa 1 5 6 
iene 2 2 
oe oS, kn aie a tae og im 1 1 
CSE a. oc vac kb we whe tte nese eeeet 2 2 4 
ale bas Dae ga pies he ea 1 1 2 
NS ee ie sont dulled 1 1 
Sr eater are erly wteacatr'e 8’ 1 ; 1 
i a oh ew say wi 6 - 5 5 
Teaching (other than college)............. 1 1 

CR te ei I Oks bie a kg oe 25 25 50 














The actual method used with each judge was to present him with 
the directions recorded below and a pack of three by five cards. On 
each card was written the name of the magazine to be rated, and these 
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were arranged alphabetically. Ten other three by five cards were s 
: labeled I through IX, with the tenth card labeled discard. i 
| Directions.—1. Arrange the cards labeled, I, II, III, IV, V, VI, VII, VIII, 
f IX, in regular order before you. 
id 2. Read completely through the cards bearing the names of the magazines. 
- 3. Set aside the magazines which you do not know by reputation at least. er 
di 4. Before card II place the magazines which in your opinion indicate a high 
| cultural level; before card V place the magazines which indicate medium culture; “ 
+3. and before card VIII those which indicate below medium culture. 
, } 5. Take the pack of cards which you placed before card II and consider them 
ta) a second time. On card I place all titles which in your judgment indicate an 
te. exceedingly high cultural level, on eard II the next highest, and on card III the - 
rE next highest. 
ey 6. Take the pack of cards which you placed before card V and reconsider them. Ya 
LEH On card IV place all the titles which in your judgment indicate the highest cultural ne 
tA level of this pack, on card V the next highest, and on card VI the next highest. 
a3 7. Take the pack of cards which you placed before card VIII and reconsider Bo 
them. On card VII place all titles which in your judgment indicate the highest Fo 
oe | cultural level of this pack, on card VIII the next highest, and on card IX the next. —: 
"4 4 Note.—1. The cards may be shifted or re-classified as many times as you choose. . Ne 
i 1 2. When you are through sorting you will have nine piles arranged in order Lin 
ia from I, the highest culture level, to IX, the lowest. Ce 
| 4 3. It is not expected that you will have the same number of cards on each pile. ry 
; ; 4. When you are through sorting bind each pack with the cover card on top. As 
ee Ser 
I 8 The judges took from ten to twenty minutes to rate the magazines. Me 
' i A few judges reconsidered their ratings, but most of them were content Ww 
ae with their first judgments. re 
if | In order to check on the degree of acquaintance the judges had Or 
é with the magazines, seventeen of them were asked to indicate which be 
q: magazines had been read within the last three months, which ones Tr 
‘ i | previous to the last three months, and which they knew by reputation be 
a fh only. It was found that twenty-four per cent of the judgments Fo 
e were made on the basis of having read the magazine within the last = 
is three months, .45.7 per cent had read the magazine previous to the Nz 
y & last three months, 18.7 per cent were rated by reputation only, and "7 
it 10.9 per cent were discarded. Thus almost three-fourths of the Ne 
‘3 judgments were made on the basis of actual knowledge of the magazine. = 
= Approximately nineteen per cent, however, were made on the basis of Vs 
reputation only, and these were largely the magazines receiving low a 
cultural rating. - 
0 


The mean cultural ratings as shown in Table II are derived by 
arranging the separate judgments for each magazine along a line of 
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judgment levels assumed to be equally spaced at one unit intervals. 
In all probability, the judgment levels are unequal in distance from 
judge to judge and from magazine to magazine. Yet the only other 


TasLe II.—Cuutture Content RatTInGs oF MAGAZINES ARRANGED IN 
DESCENDING ORDER 









































Total Total 
Per Per 
Magazine Mean | SD. | cent of Magazine Mean | SD. | cent of 
judges judges 
Yale Review............ 1.59 .47 88 Good Housekeeping...... 5.34 | 1.19) 100 
Atlantic Monthly........ 1.64 4 Se I: 5 hd ws ols 6 ceebde 5.4 1.51; 100 
Saturday Review of Popular Science.......... 5.5 1.62 96 
LAIRGPOCUTO. 2.2 ccc cccece 1.80 .62 92 Ladies Home Journal..... 5.6 1.33} 100 
pa CR eh 2.34 | 1.06 98 {Popular Mechanics....... 5.6 1.78; 100 
; NG nina cred A eee een 2.36 .83;| 100 Saturday Evening Post...| 5.62 | 1.39) 100 
ESE tek enwhendeea De A Gell. UE Macccedsthsucesctuss 5.68 | 1.57 98 
Pa cacthoseebneced 2.52 | 1.33} 100 |Delineator.............. 5.78 | 1.11} 100 
t. New Republic........... 2.54 | 1.36) 100 |Womans Home Com- 
: ER aca ci cseanaesia 2.76 | 1.43 86 A Jdcepiecenees o¢ 5.78 | 1.33} 100 
Current History......... 2.78 | 1.28) 98 Piitcbwatnea<ess8ede 6.08 | 1.51] 100 
Dighs cibeatvend <<< 2.86 | 1.23 94 American Magasine...... 6.19 | 1.37 98 
, PEN ae ee 2.9 1.40' 100 /jPictorial Review......... 6.2 1.37) 100 
, American Mercury....... 2.92 | 1.89 100 |Pathfinder.............. 6.32 | 1.7 44 
a SS ft Ee Be Pe akc ccccsecsnccoces 6.38 | 1.75 98 
National Geographic... .. 3.0 1.5 Pn pc cnseceéeeden 6.58 | 1.2 100 
: Scientific Monthly....... 3.38 | 1.79 i Ph sinddWebseeeee 6.98 | 1.95; 100 
, World’s Work........... 3.4 | 1.14) 100 |Cosmopolitan............ 7.05 | 1.5 98 
Review of Reviews....... 3.52 | 1.79| 100 |Adventure.............. 7.39 | 1.47 70 
Scientific American....... 3.69 | 1.75 Se INS cadewsedantocdes 7.72 | 1.28) 98 
l GUE aa vevin enn 3.8 | 1.51] 100 |Argosy................. 7.76 | 1.75) 70 
’ Golden Book............ 3.95 | 1.74 94 College Humor.......... 7.79 | 1.71 96 
Literary Digest.......... 3.98 | 1.3 100 Physical Culture......... 8.21 | 1.34 98 
3 Mithet iinet 44 e050 4.02 | 1.74 84 |Motion Picture Magazine.| 8.5 1.07 98 
, Pbtnnds 05000400: 4.04 | 1.63} 100 j|Photoplay............... 8.5 .97 98 
House Beautiful......... 4.17 | 1.48 98 Short Stories............ 8.64 | 1.02 70 
=} | A ea 4.19 | 1.85; 70 {Sport Story Magazine....| 8.68 .83 66 
t Readers Digest.......... 4.3 | 1.63! 98 [Film Fun............... 8.69 | 1.02) 86 
House and Garden....... 4.44 | 1.48 94 |Detective Story Maga- | 
2 Nations Business......... ere ee gees 8.7 | .98| 92 
1 Better Homes and Real Detective Magazine.| 8.71 8 94 
US ok ow ccecses 4.75 | 1.4 96 |\Complete Story.......... 8.72 | 1.06 46 
B New Yorker............. 4.9 1.6 94 Western Story........... | 8.75 .83 88 
| Parents’ Magazine....... 4.95 | 1.61} 80 |Love Story Magazine.....' 9.2 .55, 86 
“ NT dd'ssa's'cavuds was 5.01 | 1.74 86 |Breezy Stories........... 9.21 69) 90 
f Vanity Fair............. 5.1 | 1.72] 100 |Screen Secrets........... | 9.23 | .65 88 
Field and Stream........ 5.14 | 1.74; 100 |True Story.............. 9.28 51) 98 
Theater Magazine........ 5.2 | 2.48) 88 /|True Confessions......... 9.33 47, 96 
Harpers Bazaar.......... 5.26 | 1. 29) 92 Mean score.............. 5.55 
y Country Gentleman...... 5.34 | 1 a 100 jsp Nae cee ewan dared 2.28 | 
f 
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method available for determining a composite estimate of opinion 
demands the assumption of a normal distribution of judgments, which 
when used in these data results in scale values and mean scores for 
each magazine whose rank order correlation with the former method 
is .999, PE .001. Only one magazine, the ‘‘American Magazine”’ 
changed its rank order as much as three places. Since the relative 
position of this magazine from the mean of the entire distribution, by 
both methods, is the same, the actual comparative difference being 
only .23, the shift is relatively insignificant. 

In arriving at weights to be assigned to each of the magazines 
the scores obtained by the use of scale values calculated from the 
Normal Probability Curve are used. The sigma distance of the mean 
score of each magazine from the mean of the distribution was deter- 
mined and categories one-half standard deviation in length are used 
as a basis of classification. This results in five groups of magazines 
on either side of the mean to which positive ascending values of one 
each are given, beginning with the group of magazines that is the 
greatest negative distance from the mean of the entire group. 

The classification with the cultural-weight preceding each group 
of magazines follows: (1) Love Story Magazine, Breezy Stories, Screen 
Secrets, True Story, True Confessions; (2) Photoplay, Motion Picture 
Magazine, Sport Story Magazine, Real Detective Stories, Detec- 
tive Story Magazine, Short Stories, Film Fun, Western Story, Com- 
plete Story; (3) Argosy, College Humor, Physical Culture; (4) 
McCalls, Cosmopolitan, Redbook, Adventure, Liberty; (5) Popular 
Science, Popular Mechanics, Saturday Evening Post, Ladies’ Home 
Journal, Life, Woman’s Home Companion, Delineator, Colliers, 
Pictorial Review, Pathfinder, Judge, American Magazine; (6) House 
and Garden, Nation’s Business, Better Homes and Gardens, New 
Yorker, Parents’ Magazine, Hygeia, Field and Stream, Vanity Fair, 
Theatre Magazine, Harper’s Bazaar, Country Gentlemen, Good 
Housekeeping, Vogue; (7) World’s Work, Review of Reviews, Scientific 
American, Golden Book, Outlook, Literary Digest, Travel, Time, 
House Beautiful, Fortune, Reader’s Digest; (8) Living Age, Current 
History, American Mercury, Asia, Survey, National Geographic, 
Scribners, Scientific Monthly; (9) Bookman, The Nation, Forum, 
Harpers, New Republic; (10) Yale Review, Atlantic Monthly, Satur- 
day Review of Literature. 

The question might be raised as to why the mean rather than the 
median was chosen as the measure of central tendency. Proceeding 
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on the principle that the average whose standard error is the least, is 
the best measure to use, the mean and the median with their standard 
errors were calculated for eighteen magazines. Sampling the entire 
list of magazines it was found that the standard error of the median 
was higher than the standard error of the mean for sixteen magazines, 
and equal fortwo. Hence, the mean was chosen as the better measure 
to use. 

Since not all of the magazines were rated by an equal number of 
the judges, an attempt was made to determine the effect of combin- 
ing the various percentages to secure a composite rating. Table III 
gives the percentage of judges, the number of magazines in each class, 
the mean score, sigma for each category, and the difference between 
means for successive levels divided by the sigma of the difference. 


TaB_Le III.—ComparaTIvE ANALYSIS OF MEAN CULTURAL ScoRE OF MAGAZINES 
RaTED BY VARYING PERCENTAGES OF JUDGES 








Chances in 
Number of h 
magazines Mean Sigma of D ee ousand 
Percentage class : cultural ' —| of difference 
in each - difference |c Diff. 
rating greater than 
class 
zero 
Sih « tn ch oo we thn 28 4.37 1.42 
I Bei ie os koe 27 5.98 2.42 .57 716 
DEC eel eaciuctasece 12 5.66 2.90 .09 536 
res 7 7.38 1.50 . 57 716 




















The table should be read as follows: Twenty-eight magazines were 
rated by one hundred per cent of the judges, twenty-seven by ninety 
to ninety-nine per cent of the judges, etc. It will be noted that the 
probability of our composite mean score having been distorted by 
treating all the magazines as a unit, regardless of the per cent of judges 
rating them, is slight. The difference between the successive mean 
scores given in Table III is small. The largest difference is between 
the magazines judged by one hundred per cent and those judged by 
less than eighty per cent of the judges. This difference divided by 
its standard error is 1.46 and the chances in one thousand of a differ- 
ence in the same direction in the case of similarly selected populations 
are nine hundred twenty-eight. However, this difference is not 
sufficiently large to justify throwing out these seven magazines. 
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An analysis of the separate ratings for men and women judges 
shows no sex differences. Within their sampling errors the ratings 
were the same. In no instance was the magnitude of the difference 
in variability of judgment between men and women sufficient to 
predict a reversal in classification in another drawing of similarly 
equipped judges. The mean score for the men is 5.62 + 2.25; for 
the women 5.51 + 2.29; the Pearson product moment correlation 
between their ratings is .982 PE .003. 

Since it was thought that the judges with University connections 
would tend to over-rate the so-called “highbrow” magazines and to 
under-rate those at the opposite extreme, a comparison of the judg- 
ments of the twenty-six judges having official University connections 
was made with those of the non-university group. Apparently, if 
this factor is operating, it is operating for both groups since the 
Pearson correlation between the two is .963 PE .006. 

How good an index are the factors of circulation figures and 
price per copy to the cultural content of the magazine? The Pearson 
r between the mean rating of the magazine and its circulation figure, 
as given in the Ayres Newspaper guide, is —.2935 PE .072. From the 
scattergram, the correlation appears to be curvilinear, and hence 
the etas were obtained. These are .677 PE .043 and .639 PE .046. 
The same procedure was followed for the correlation between price 
per copy and mean ratings. The Pearson r is —.444 PE .063, and 
the etas are .600 PE .050 and .597 PE .050. Thus our data indicate 
that the factors of circulation and price per copy are not independent 
of,cultural ratings. 

The uses to be made of such judgments as here described are 
primarily in the construction of home-rating scales and in nature- 
nurture research. They can serve as a ready factor descriptive of 
home differences and as suggestive possibly of differences in the school 
performance of children. 





1A table showing the separate judgments of men and women may be secured 
from the Institute of Child Welfare, University of Minnesota, Minneapolis, 
Minnesota. 
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A COMPARISON OF THE DIFFICULTY AND VALIDITY 
OF ACHIEVEMENT TEST ITEMS 


LORNE J. HENRY 
Central Technical School, Toronto 


1. INTRODUCTION 


When a teacher elects to use objective tests for measuring achieve- 
ment in a school subject, he is confronted with the task of constructing 
tests that will meet the requirements of validity. That is, a test must 
measure achievement, and that achievement must be in the school 
subject in connection with which the test is to be used. An objective 
test carelessly constructed may have no greater validity, and may have 
even less validity, than the traditional essay-type of examination. 
On the other hand, an objective test may possess a high degree of 
validity if the test maker takes adequate care in its construction. 
Herein lies one of the chief advantages of the new-type test. 


2. PROBLEM 


The problem of this study was to determine to what extent the 
single factor of the difficulty of a test item is related to the validity of 
that item. Should a teacher, for example, in the interests of validity, 
when constructing an objective test, endeavour to make the items 
easy, difficult, or of medium difficulty? 


3. DATA 


The data used consisted of the responses of one hundred pupils 
on an objective test in physiography consisting of one hundred eight 
items of the five-response type. The one hundred pupils constituted a 
sampling of a total of nine hundred sixty pupils who tried the test. 


4. PROCEDURE 


The difficulty of each item was determined by calculating the num- 
ber of pupils who had the item right. The greater the frequency of 
correct responses the easier the item. The items were then divided 
into three groups, as follows: (1) The thirty-six easiest items; (2) the 
thirty-six most difficult items; and (3) the thirty-six remaining items, 
called the ‘‘medium”’ group. The right responses of the pupils were 
counted on each group of items. 
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To compare the three groups of items as to validity, five sets of 
correlations were obtained. A validity value was calculated for each 
item by the method known as the Biserial coefficient of correlation. 
The thirty-six items having the highest validity values were selected, 
and the scores of the pupils on these “‘ best’ items were counted. Then 
the pupils’ scores on each of the “‘difficulty”’ groups were correlated 
with their scores on the “‘best’’ group selected by the Biserial r 
method. The same procedure was followed with the Clark validity 
method, the Vincent validity method, the Upper and Lower 
Thirds method, and a combination of these. The validity methods 
are explained below. 

(a) Biserial r—The formula for this method is as follows:! 





» — Mi — M2)pq 
oz 
where M, = mean score of the pupils having the item right 
M, = mean score of the pupils having the item wrong 


p = percentage of pupils having the item right 

percentage of pupils having the item wrong 

standard deviation of all criterion scores 

= the ordinate of the normal probability curve cutting off 
p cases, as read from table in Appendix C, Kelley: 
Statistical Method. 


(b) The Clark Method.—The Clark formula? for evaluating test 
items is as follows: 


xn QAR 
Il 


P—D 
Mek os 


where V = validity of the item 
D = percentage of pupils who fail to answer the item correctly; 
it is the difficulty of the item 
P = percentage of the criterion group who fail to answer the 
items correctly. The criterion group is the D percentage 
of the class having the lowest scores. 


This method may be illustrated as follows: Suppose that four 
hundred pupils out of one thousand have an item wrong, and that, 


1 Barthelmess, H. M.: The Validity of Intelligence Test Elements. New York: 
Bureau of Publications, Teachers College, Columbia University, 1931, p. 13. 

2 Clark, E. L.: ‘‘A Method of Evaluating the Units of a Test.” Journal of 
Educational Psychology, Vol. XIX, 1928, pp. 263-265. 





of t 
fail 
mu 


an 

up¢ 
cal 
sco 


hig 
pu] 


cer 
shc 
me 
nw 
ha 
the 


be: 
col 


of 


ab 


Jo 








Achievement Test Items 539 


of the four hundred pupils having the lowest scores, two hundred forty 
fail the item. Then D = .40 and P = .60. Substituting in the for- 
mula we have 


60 — 40 _ 


V =- = 33 


1 — .40 

(c) The Vincent Method.—The Vincent validity method! evaluates 

an item by measuring the extent to which the poor pupils overlap 

upon the good pupils on that item. Overlapping is measured by 

calculating the percentage of pupils failing the item who have criterion 
scores higher than the median score of the pupils passing the item. 

The following are the steps in the calculation of the Vincent score: 

1. Arrange the pupils in the order of scores, beginning with the 
highest. 

2. Find the total number of pupils having the item right. 

3. Locate the median passing score, that is, the score of the middle 
pupil of those passing the item. 

4. Count the number of failures above the median passing score. 

5. Express the failures above the median passing score as a per- 
centage of the total failures. 

6. In ranking items by this method, that item ranks highest which 
shows the smallest percentage of overlapping. 

(d) The Upper and Lower Thirds Method.2—According to this 
method the value assigned to an item is the difference between the 
number of pupils having the item right among the third of the pupils 
having the highest total scores, and the number having it right among 
the third having the lowest scores. 

(e) A Combination of the Above Methods—When the thirty-six 
best items selected by the four validity methods mentioned above were 
compared, it was found that seventeen items were common to all. 
Hence, these seventeen items were used as the basis for a further set 
of correlations. 


5. RESULTS 


The resulting coefficients of correlation, together with their prob- 
able errors are given in Table I. 





1 Vincent, Leona: A Study of Intelligence Test Elements. New York: Bureau of 
Publications, Teachers College, Columbia University, 1924, p. 11. 

2 Lentz, Theo. F. Jr. et al.: “‘ Evaluations of Methods of Evaluating Test Items.” 
Journal of Educational Psychology, Vol. XXIII, 1932, pp. 344-350. 
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TaBLe I.—Tue CoEFFICcIENTS OF CORRELATION BETWEEN THE SCORES ON THE 
‘*DirricuLty” Groups OF ITEMS AND THE SCORES ON THE VALIDITY GROUPS 





Easiest Medium Hardest 
thirty-six thirty-six thirty-six 





r PE r PE r PE 





Biserial r.................+.2-+06-] .785 | + .026) .827 | + .022) .801 | + .024 
Clark. ......... 0. cece ce ee cece ee ee} -807 | + .024 .925 | +.010} .813 | + .023 
Vimoent..............cc eee eeee es} -400 |+.081] .827 | + .022) .826 | + .021 
Upper and lower thirds.............| .731 |+.031| .889 |+.014) .863 |+.018 
Combination......................| .704 |+.034) .796 |+.025) .758 | + .028 























A study of the data given in Table I raises the question as to the 
reliability of the differences obtained. Would the same procedure 
used with another test yield similar results, or is the superiority of one 
group over another a chance superiority which another test might 
reverse? 


TasLeE II.—TuHe CogrrFIcIENTS OF RELIABILITY OF THE DIFFERENCES AMONG 
THE r’s OF TABLE I 














Coefficient of reliability 

Medium Difficult 

aA nn wae dacuenads bOaeba babe —1.2 —0.46 
ot ale ek eee weled —3.5 —0.18 
ke ig ies ee —2.5 —2.53 
Upper and lower thirds..................... —4.6 —3.71 
EES OEE ee PORT OEE Te —2.2 —1.30 
TR ae ee ae ee 0.79 
a ta cn a a 4.89 
I eae ten So oe eee aaa” = Sk awk 0.04 
ice ccanse peewee saeak.  sbaee 1.12 
EE ee eee ee Seer 1.03 








The reliability of the difference between two coefficients of correla- 
tion may be found by the formula! 


Coefficient of Reliability = = 
diff. 





1 Garrett, Henry E.: Statistics in Psychology and Education. New York: 
Longmans, Green and Company, 1926, pp. 171-172. 
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where D = the difference between the two r’s 
and 





PE. = /PE®, + PE?, 


The application of this formula to the differences among the r’s 
in Table I, yields the reliability coefficients given in Table II. This 
table reads from left to right, and the minus sign indicates that the 
superiority is in favour of the group mentioned along the top of the 
table. Thus, the coefficient of reliability of the difference between 
the ‘‘Easy”’ and ‘‘Medium” groups, when each is correlated with the 
thirty-six best items selected by Biserial r, is 1.2, and the difference is in 
favour of the ‘‘ Medium” group. 


6. CONCLUSIONS 


On the basis of the data presented in Tables I and II, the following 
conclusions appear justified: 

1. There is no reliable superiority of any one of the groupings of 
items according to difficulty over any other grouping. A significant 
difference is indicated only when the coefficient of reliability between 
two 7’sisfour. This occurs in but two instances. 

2. Some superiority of the ‘‘Medium”’ over the “‘Easy”’ group is 
indicated. This may be accounted for by the fact that a few very easy 
items were placed at the beginning of the test to encourage the pupils. 
On one item the percentage of passes was one hundred, and on two 
others, ninety-eight each. Obviously, items of this sort have little 
validity value, though they have another value, as suggested. 

3. Except in the case of the Clark method, which apparently 
favours items of medium difficulty, there is little more than a chance 
difference between the ‘‘Medium’”’ and “ Difficult’’ groups. 

4. Apart from extreme items—those on which nearly all the pupils 
pass, or those on which nearly all fail—the difficulty of an item has 
little to do with its validity. 
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CORRELATION BETWEEN SOME PERSONALITY TEST 
SCORES OF SIBLINGS AND INTERCORRELATIONS 
BETWEEN THE SCORES 


A. B. KOCH 
Emporia High School 


AND 


J. B. STROUD 
Kansas State Teachers College 


This study was undertaken primarily to determine the relation- 
ship between certain personality test scores of two groups of siblings, 
and secondarily to determine the intercorrelations between the test 
scores of the subjects involved. The subjects were three hundred 
twenty-four college and senior high school students, making one 
hundred sixty-two pairs. Measurements were made in intelligence, 
personal adjustment, socio-economic status, and in traits of intro- 
version-extraversion. 

As a measure of intelligence, the scores on the Haggerty Intelligence 
Examination Delta two were used for the high school students, and the 
scores on the Kansas State Teachers College Entrance Test were used 
for the college students. These intelligence tests were administered 
as a part of the routine testing program of the two institutions, and 
the scores were supplied by the administrative officers in charge. 
A test for socio-economic status somewhat similar to Sim’s Home 
Rating Test was employed, the items covering professional and occupa- 
tional status, physical conditions of the home—manner of heating, 
number of rooms, and kind of furnishings, and other factors such as, 
schooling of parents, number of children in the family, size of family 
library, social position, and the like. Incidentally, the reliability 
of this test as indicated by the correlation between the scores of 
siblings was found to be .88. Thus, we may assume that the test 
possesses satisfactory reliability. As a measure of personal adjust- 
ment, a combined test was used which consisted of forty items from 
the Thurstone Personality Schedule—the forty items which, as he 
indicated, have the greatest differentiating significance and thirty-six 
items from the Woodworth Psychoneurotic Inventory. Those items 
were selected from the Woodworth test which appeared to have 


the least overlapping with the Thurstone test. The Neymann- 
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Kohistedt Disznostic Test was used as a purported test of intro- 
version-extraversion.! 

The tests, with the exception of the intelligence tests, were adminis- 
tered by the iirst-named author. They were given individually or 
in such small groups as could be assembled without disturbance to 
school routine. 


RESULTS 


The average score on the introversion-extraversion test is +6.02, 
indicating a slight tendency toward extraversion. The average num- 
ber of neurotic answers (answers signifying maladjustment) is 23.14. 
The answers marked ‘‘doubtful” by the subjects were scored as 
neurotic answers. 

Correlation between Intelligence Test Scores of Siblings.—A correla- 
tion of .41 was obtained between the test scores of the siblings which 
comprised the college group. The correlation between the scores 
of siblings comprising the high school group was .63. These results 
agree in the main with other findings. The correlations between 
the intelligence test scores of siblings reported in the literature range 
from about .30 to .65, the average being close to .50. 

Relation between the Personal Adjustment Scores of Siblings.—A 
correlation of .10 + .056 was obtained between the scores of siblings 
on the personal adjustment test (the Thurstone-Woodworth test). 
This correlation coefficient signifies a negligible relationship between 
these two variables. Should subsequent studies likewise fail to find 
any significant relationship between maladjustment scores of siblings, 
such results would seem to make it hazardous to generalize about the 
influence of home conditions upon such traits as are measured by 
these tests. Lack of significant relationship here suggests that the 
factors which contribute to maladjustment must be evaluated sepa- 
rately for each individual. It is a well-established principle that it is 
the effect of objective stimulating conditions rather than-their inherent 
nature which modifies personality. The same objective stimulus 
affects different individuals in different ways. In other words, the 
meaning of any particular stimulus depends not only upon the nature 





1 A second experiment is in progress in which Thurstone’s personality schedule 
is used in its entirety and in which Sim’s Home Rating Test is used as a measure 
of socio-economic status. In this study the Ascendence-Submission Test is also 
used. 
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of the stimulus and the contextual setting under which it appears 
but also upon the nature of the individual—his past experiences, 
his inheritance, and his mental and physiological conditions at the time 
of the experience. It is easy to understand how any given event or 
any protracted series of home influences may affect different members 
of the family in different ways. 

Relation between Introversion-extraversion Scores of Siblings.—The 
results of the present study fail to indicate any significant relationship 
between the introversion-extraversion scores of siblings, as is shown 
by the negligible correlation of .09 + .056. Assuming that such traits 
exist, and that this test measures them, these data likewise suggest 
that their causes must be discovered in the case of each individual. 
The lack of correlation between the manifestations of such traits by 
siblings prevents one from generalizing regarding the effect environ- 
mental influences which supposedly promote such traits. 

One further interpretative principle is suggested. The foregoing 
results indicate that those traits which are measured by these tests are 
acquired. The writers insist that, while the presence of a significant 
correlation between the measurements of any trait exhibited by siblings 
is no proof that the trait is hereditary, the absence of such relationship 
does show that the trait is not hereditary. There are, of course, 
many instances where positive findings do not establish a point, but 
where negative findings do disprove it. A case in point is Galton’s 
findings to the effect that the incidence of expression of certain behavior 
traits on the part of siblings correlated to the extent of about .50, 
this being about the magnitude of the correlation obtained between 
such physical traits of siblings as color of the eyes and of the hair. 
Galton reasoned thus: Since the correlation between mental traits 
of siblings is as high as that between physical traits—traits which are 
admittedly innate, these mental traits must also be innate. The 
falacy of such reasoning is obvious when one bears in mind that the 
argument can be reversed with equal force. There is doubtless a high 
correlation between church affiliation of siblings or between their 
allegiance to political parties—factors which are admittedly environ- 
mental. Let us suppose that the correlation is as high as .85, would 
anyone say: “Therefore, all traits in which there is a correlation 
of .85 between their incidence among siblings are environmental 
in origin?’’. On the other hand, should one find that mental traits 
of siblings do not correlate as highly as those traits which are known 
to follow the usual biological laws, he would be justified in assuming 
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that they are not inherited to the same degree, in the event, of course, 
that the traits have been accurately measured. Furthermore, if 
no correlation appears between the manifestations of a particular 
trait by siblings, we are warranted in assuming that the trait is not 
hereditary. A correlation between traits of siblings does not prove 
that the traits are innate, but the absence of such correlation does prove 
that they are not innate. 

It is just possible that differences in age of siblings may disturb 
the relationship between their personality test scores. However, 
the relationship between the scores of six pairs of twins was no closer 
than that between siblings. One member of a pair of identical twins 
scored +36 on the introversion-extraversion test while the other scored 
—.28. Nevertheless, the number of cases is too small to warrant a 
generalization. 

Relationship between the Socio-economic Scores of Siblings—Com- 
putation of the correlation between the socio-economic scores of 
siblings was made for the sole purpose of testing the reliability of the 
test. Each pair of siblings was asked to report upon exactly the same 
home conditions, and the degree of correspondence between the scores 
affords an excellent criterion of the trustworthiness of such reports. 
As was stated above, a correlation of .88 was obtained in this case. 

Relationship between the Test Scores——Four tests have been used 
altogether. It is possible by the method of correlation to determine 
to degree to which the tests are independent measures. The lack 
of significant correlation between two reliable tests indicates that they 
measure different variables. A significant correlation between two 
reliable tests signifies that the tests either measure the same thing or 
that the traits measured by each are correlated. If two tests be 
both reliable and valid, a significant correlation between them signifies 
that the traits measured by them are related to each other. The 
results of the present study are listed in the following table. 


TaBLE I.—INTERCORRELATIONS BETWEEN TEST SCORES 








3 4 5 
1. Intelligence test (college)................ceeeeeees .12 | —.02 .23 
2. Intelligence test (high school)..................... —.03 | —.05 13 
Ne ei cw cuewe bosneuesemah -ovece .08 .05 
a. ack eeeescasicsbesh obese Lvecees .10 
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With the possible exception of the correlation between the intelli- 
gence test scores and the scores on the socio-economic test, none of 
these correlations is significant. Since these tests possess satisfactory 
reliability, we may conclude that the variables which they measure 
are largely independent of each other. 

As a further means of studying the relationship between intelligence 
test performance and socio-economic status, the socio-economic 
status of the students ranking in the upper ten per cent on the intelli- 
gence tests are compared with those of the students falling in the 
lowest ten per cent on the intelligence tests. The highest ten per 
cent in intelligence test performance made an average score of 55.6 
on the socio-economic test while the average score on the same test 
of the lowest ten-per cent in intelligence test performance was only 
41.4, the average score for the entire group of three hundred twenty- 
four students was 46.05. It frequently happens that considerable 
difference is observed between the extremes of a distribution even 
when correlations based upon the whole distribution are relatively low. 


CONCLUSIONS 


1. No significant relationships were found between the personal 
adjustment scores of siblings or between their introversion-extraversion 
scores. Significant correlations were obtained between the intelligence 
test scores of siblings, these being .63 for high school students and .41 
for college students. 

2. The only significant intercorrelation between the test scores 
was obtained between the intelligence test and the socio-economic test. 
Even in this case the correlations are low and statistically significant 
only in case of the college students. 
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A NOTE ON EFFICIENCY OF PREDICTION » 
WALTER 8S. MONROE 


University of Illinois 


In the March, 1934, number of the JournaL or EDUCATIONAL 
PsycHoLoGy Douglass' presents evidence that the efficiency of 
predictions from the regression equation, as measured by the formula. 


E=1-vV1 — ry, 


is materially less than the observed efficiency when the predictions 
are compared with ‘“‘random guesses.” This formula is derived by 
comparing the standard error of estimate (00.1 = o0+/1 — r%o;) with 
9 Which is the standard error of estimate when M), is used as the predic- 
tion for all members of the population. 


(con = [22 = on), 


The standard error of estimate for chance predictions (X,) is derived 
as follows. If the chance predictions are assumed to have the same 


mean and the same standard deviation as the criterion Xo, the standard 
error of estimate is given by 


Fo.9 = 2% ~ Ss)" 


























N 
“ai 7 — 24)? 
= V . 
_ {2%o"  _2Xox, 2x," 
ss "2 = 


Since z» and z, are uncorrelated and oy = a,, 
J0.g = V/ a0" + Ge" 
= V/ 200°. 


If the standard error of estimate oo; = o0\/1 — r%o, is compared 


ate: ae 
with this statistic, 1 — Food is obtained as a measure of improve- 
2 


1 Douglass, H. R.: “‘Some Observations and Data on Certain Methods of 
Measuring the Predictive Significance of the Pearson Product-Moment Coefficient 


of Correlation.” Journal of Educational Psychology, Vol. XXV, March, 1934, 
pp. 225-232. 
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ment over chance prediction or random guess. In order to avoid 
confusion, we may write | 


Ey =1- V1 — rn 


1-7 
E,=1- >" 


Although Bailor, to whom Walker! credits the first use of the term 
“predictive index”? and presumably the formula for Ey, makes 
clear? that this statistic measures the improvement of the regression 
equation values over the mean (M,) as predictions, the interpretation 
“percentage of improvement over chance in prediction” has been 
commonly made. This is unfortunate because the values of E, 
are materially larger than the corresponding ones for Ey. 

If true measures (X..) are made the criterion instead of fallible 
measures (X,), the two corresponding formulae are 


Eau = 1 = Vie” 7701 
Ee, ee Bien V/ roo a ron 
V1 + roo 


1 Walker, Helen M.: Studies in the History of Statistical Method. Baltimore: 
The Williams and Wilkins Company, 1929, p. 186. 

2 Bailor, E. M.: “‘Content and Form in Tests of Intelligence.”’ Teachers 
College, Columbia University Contributions to Education, No. 162. New York: 
Bureau of Publications, Teachers College, Columbia University, 1924, p.25. 
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BOOK REVIEWS 


Ciara Bassett. Mental Hygiene in the Community. New York: 
The Macmillan Co., 1934, pp. 394. 


Any field that is inclusive in its foundations and rich in its possible 
applications is in danger of being over-popularized and hence is in 
frequent need of reorientation and reintegration. Mental hygiene 
is such a field. Furthermore, it is a relatively new field. It is 
dependent on fields of knowledge in which rapid advances are taking 
place, and in its applications it is related to social, political, and 
economic attitudes that are at the present time in a state of flux. 
An attempt at reorientation in such a field, then, calls for a familiarity 
with growing sciences as well as an appreciation of human needs 
in an evolving society; and a sense of proportion as well as a courageous 
outlook. Thisisnotasmallorder. Mental Hygiene in the Community 
is such attempt. The nature of the effort is indicated in the following 
paragraphs. 

Mental hygiene is science. It is art. It is a movement. Con- 
sidered as a science, it is a complicated gestalt or configuration of 
many sciences which serve it as foundations, and includes all con- 
tributions from the biological and social sciences which throw light 
on the nature of personality. As an art, it concerns itself with prob- 
lems of behavior in home, school, industry and every other kind of 
institution that deals with human beings. Considered as a move- 
ment, it is related to public health education as well as to social and 
economic movements of the day. In Mental Hygiene in the Community 
Miss Bassett, a consultant psychiatric social worker for the National 
Committee for Mental Hygiene is largely concerned with mental 
hygiene as an art and as a movement; that is, with the field of mental 
hygiene in action. Some of the fields of action that she discusses in 
separate chapters are medicine, nursing, social work, law as related 
to delinquency, teacher training, theological training, industrial 
management and recreational education. Besides, there is included 
in this book a chapter on the pre-school child and a last chapter on 
psychiatric institutions and agencies. 

In her discussions of the many aspects of the field, by and large 
she indicates the possession of an actual familiarity as well as feeling 
of familiarity with the work, particularly as it is confronted by the 
social worker. On the whole, the authorities she selects to bank 
on in the treatment of special topics are well selected and their findings 
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or attitudes intelligently interpreted. Frequent references to papers 
presented at the International Congress on Mental Hygiene in all 
chapters, as is to be expected, is found. In the discussion of parental 
education, Dr. Lawson Lowrey and Bertha Reynolds are the authorities 
most frequently used. In this connection more attention might have 
been given to the field of adult education, which is becoming of 
increasing social significance at the present moment. In the chapter 
on the pre-school child, perhaps not enough attention is given to the 
experimental studies in the field, studies by Gesell or Meek, for 
example. In discussing problems of education her chief mainstay 
is Dr. Rugg. In her discussion of recreation she accepts Stuart 
Chase as the philosopher and Claudia Wanamaker as the technician. 
Her last chapter on “Mental Hygiene and Psychiatric Institutions 
and Agencies”’ includes brief descriptions of mental hygiene societies, 
state departments of mental hygiene, hospitals for mental disease, 
institutions for the feebleminded, psychopathic hospitals, psychiatric 
clinics, child guidance clinics, psychological clinics. Every chapter, 
aside from the introductory chapter, is followed by a fairly thorough- 
going list of questions and suggestions. 

As a point of reference for familiarizing one’s self with the whole 
mental hygiene movement today, this is probably the most adequate 
book on the market to date. The book should be of use not only 
to people in the field of mental hygiene but as a reference book, as 
well as a text in beginning classes in mental hygiene. The style of 
writing is not as expressive as the outlook is courageous, but the style, 
considering the difficulties of the topics discussed, is clear and intel- 
ligible, and the work as a whole is timely. H. ME.LrTzeEr. 

Psychological Service Center, St. Louis. 


CHartes H. Jupp. Education and Social Progress. New York: 
Harcourt, Brace & Company, 1934, pp. XII + 285. 


Professor Judd is at his best when he is confronted with a broad 
canvas and has a big brush in his hand. This review of the present 
strengths and weaknesses of American education could only have been 
written (or dictated?) by one thoroughly versed in the workings of both 
European and American school systems and their historical back- 
grounds. Further, there is nobody in America who quite equals Judd 
in making sound generalizations from a welter of apparently discon- 
nected details. 
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There are eleven chapters dealing with such topics as Reorganiza- 
tion of Education—a social problem; the Industrial System and the 
education of children; Social Forces that determine the School Curricu- 
lum; The New Demand for Hygienic Conditions; the Solution of 
Educational Problems through Scientific Studies; Education in the 
Future Social Order and so forth. There are few quotations, prac- 
tically no references, and each topic is so briefly dealt with that one 
can read the book quite easily in three hours. 

Judd has a nimble mind and hits hard, so there are no dull moments 
for the reader. In dealing with the curriculum he shows how it has 
always been responsive to social pressures and even to propaganda. 
On the whole he thinks the enrichment of the curriculum has been 
advantageous and that the present is a period of experimentation. 
He, apparently, is quite content to see children force Latin out of 
school because they do not wish to study it. Such attitudes are 
almost incomprehensible to persons trained in European schools. 
To them the greatest weakness in American schools is this easy-osey 
way of introducing subjects into the curriculum in response to a 
so-called social demand, and dropping the hard ones to make room 
for something easier. 

In California, at a conference at which I spoke, responsible super- 
intendents of schools defended the introduction of cosmetology 
into high schools, because, forsooth, so many girls now earned livings 
in beauty parlors. I replied that some men got their livings by 
thieving, some women in ancient but questionable ways, but that was 
no reason for introducing courses in these subjects into the high schools. 
As a matter of fact you can’t depend upon any single body of knowl- 
edge being known by the high school graduates of America, and this 
fact, more than any others, is responsible for that lack of social 
solidarity which is so noticeable to visiting educators. 

Yet an American philosopher—one Dooley of Chicago—in speaking 
to his friend Hennessey about his child who had just gone to school for 
the first time said “‘I don’t care what they larns him, Hinnissey, if 
it’s only hard enough.” Until there is a return to the more rigid 
standards of the past; to subjects which demand a rigorous intellectual 
discipline; to subjects which are hard because they have a tincture 
of iron in them; and to more puritanical standards there seems to be 
little hope for the future. Yet Judd would not agree with me, although 
he shows some signs of repentance in his plea for the inculcation of 
a spirit of earnestness in all high school pupils. 
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The book, as the preface states, is addressed to parents, public 
officials, and taxpayers. May they be induced to read this challenging 
work! PETER SANDIFORD. 

University of Toronto. 


Norman C. MEIER, Director. Studies in the Psychology of Art. Uni- 
versity of Iowa Studies in Psychology, No. XVIII, Psychological 
Monographs Vol. XLV, No. 1, Princeton; Psychological Review 
Co., 1933. 


The eleven studies reported in this monograph have been made in 
the art section of the laboratory at the University of Iowa under 
the direction of Dr. Norman C. Meier. Nine of the reports are of 
researches in the artistic abilities of children between the ages of two 
and ten years. An introduction by Dr. Meier gives the historical 
orientation for this group of researches. The purpose of the first 
five studies is to discover the age at which various artistic capacities 
emerge. These include investigations of the age of appearance of the 
sensitivity to compositional balance, rhythm in graphic form, compo- 
sitional unity, color harmony, and emergence of creative artistic 
imagination. They are followed by accounts of differences in play- 
ground behavior in artistic and non-artistic children, variations in the 
aesthetic environment of artistic and non-artistic children, and 
differences in the psycho-physical capacities of artistic and non- 
artistic children. 

Perusal of the monograph leads to a conviction that Dr. Meier 
and his coworkers have made a contribution to the field of psychology 
which is of primary importance. Those interested in educational 
psychology or the psychology of pictorial art should welcome these 
studies with enthusiasm. They offer a convincing experimental 
approach in a neglected and difficult field, and are especially valuable 
because of new techniques used. Statistical analyses are employed 
for treating results. This monograph provokes the desire to read 
of further research on the topics considered. The articles, which 
were prepared by the individual experimenters, are for the most part 
well written, although some of them are unnecessarily obscure. 

An outstanding feature of the monograph is the excellent illustra- 
tions, including a number of fine color plates. The bibliographies 
offered are valuable to one interested in experimental aesthetics. 

The titles of the last two articles in the monograph are: “The 
Psycho-physical Capacities and Abilities of College Art Students 
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of High and Low Standing,” and “An Experimental Investigation 
of the Basic Aesthetic Factors in Costume Design.” 


Roy C. LANGrForp. 
Kansas State College of Agriculture and Applied Science. 


NorMAN WOELFEL. Molders of the American Mind. New York: 
Columbia University Press, 1933, pp. XII + 304. 


This is an interesting addition to educational literature, and for a 
number of reasons. Its critical purpose, showing its head as it does 
in an analysis of the points of view of “‘seventeen leaders in American 
education,” is positive guarantee that each reader will be irritated into 
taking sides before his reading is completed, if, by chance, his partisan- 
ship is not so stirred that he will stop reading in order to take sides. 
The opportunity for irritation is facilitated by the classification which 
the author follows in his examination of the educational ideas and 
social values characteristic of the educators whom he treats. 

Such classification is not easily achieved, and it may be that the 
impossibility of keeping many of the individuals clearly within the 
somewhat neat limits that this scheme sets down will be looked upon 
by many as sufficient reason for not attempting it. What the author 
has done, however, has been to locate dominant trends in educational 
thought and to associate with each those people whose writings exhibit 
a central tendency that at least provides a directive factor if it does not 
entirely force them within a given frame. His groupings follow: 
Educators (1) stressing values inherent in American historic traditions, 
(2) stressing the ultimacy of science, and (3) stressing the implications 
of modern experimental naturalism. 

Mr. Woelfel is aware of the difficulties confronting his classification. 
Having put down his limits, however, he manages to force the central 
tendency of each educational view examined sufficiently into the open 
so that he is not easily charged with inaccuracy or misinterpretation on 
this score. Of the treatment of the positions, once they have been 
located—well, that is another story, as certain earlier reviews have 
revealed. When one recognizes that the author frankly associates 
himself with the position he designates ‘‘modern experimental natural- 
ism,’”’ it ought to be no surprise to discover that readers who lean in 
the other directions the classifications piovide are in full ery at his 
heels. For my own part, the quarrel that I would pursue in one or 
two specific instances where an author seems not to have been under- 
stood, is full offset by the critical effort to examine the educational 
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and social scene by placing in sharp contrast the conflicting ideas that vi 
: have led us to inevitable confusion as their basic points of difference in 
a have escaped us. t] 
It is important that these contrasts be made. It is also important te 
that the educator be stimulated to the reflection that will make him m 
take sides with understanding and appreciation. At this point the Re 
if book should serve a useful purpose. It is a bit unfortunate that the al 
it author adds to a mechanism of analysis intended to be stimulating a k 

if trick of expression that must on occasion be irritating. And to some, 
| as to this reviewer, it may appear that the title of the book quite out- 0} 
stretches the intent of the content. fc 
What ought not to be overlooked in the heat that may be caused by n 
the critical passages is an effective, though brief, treatment in the e) 
ia opening section of contemporary social change. The fact of change is T 
a2 { shown, important elements within it, such as the decline of the Chris- Dp: 
4 tian tradition and of the business regime, are presented, and our th 
resources for future social reconstruction are suggested. This chapter ec 
clearly shows the need for a critical approach to educational thought t! 
and practice, however one may judge the character of this volume. is 

Finally, it is gratifying, since this book not only grows out of a disserta- 

tion study, but likewise out of one in education to see that Dr. Woelfel 
was encouraged to move to the interpretative level and permitted to G 

swing into a readable style. H. Gorpon HvttrFisu. 
Ohio State University, and The Dalton School. 

Witr1am McDovueauui. The Energies of Men. New York: Charles 1 
Scribner’s Sons, 1933, pp. XII + 395. : 
This book is an abridgment and a revision of the author’s Outline tl 
of Psychology and Outline of Abnormal Psychology. As a revision, it cl 
has much material of interest and value to students of systematic sc 
psychology. It is the newest as well as the clearest exposition of fi 
McDougall’s hormic psychology that has yet appeared. But its very L 
excellence as a systematic exposition is its chief weakness as an ele- A 
mentary text. The author has not hesitated to mix straightforward v 
bat description of experimental work and non-controversial exposition a 
a with technical arguments for the hormic interpretation and against ny 

4 4 other explanatory systems. It was his intention to relegate the more 
a technical matters to appendices at the ends of the chapters, but the [ 
foe separation has not always been the most fortunate. Some of the most E 








ctr FT SG Rh OO oF D 


s+ © @ 


Book Reviews 555 


valuable passages in the book are found in these appendices, as for 
instance the descriptions of the work of Cannon and Pavlov. On 
the other hand some of the most highly technical sections appear in the 
text proper ; for example the chapter on Disposition, Temper, Tempera- 
ment and Character, which consists mainly of an attempt to give 
genuinely differential definitions of these and one or two other terms, 
and so outline a framework for future research in a field in which little is 
known at present. 

For the beginning student of education at least, it is the reviewer’s 
opinion that the selection of topics is on the whole superior to that 
found in any other text. Most of the space usually devoted to the 
nervous system, sensation and perception is given instead to six 
excellent chapters on abnormal psychology and mental hygiene. 
There is much less overlap with the subject matter of educational 
psychology than is the case with most elementary texts. In view of 
these facts there is little doubt that many teachers of the elementary 
course will adopt this book as a text in spite of its technical and con- 
troversial passages. Its placein the literature of systematic psychology 
is in any case assured. Epwarp E. CurErTon. 

Alabama Polytechnic Institute. 


GrorGE S. STEVENSON AND GEDDEs SmiTH. Child Guidance Clinics. 
New York: The Commonwealth Fund, 1934, pp. VII + 186. 


Child guidance clinics in the United States are recent develop- 
ments. The first one of the modern pattern dates back to 1922. It 
was in the spring of that year that the National Committee for Mental 
Hygiene organized the first demonstration clinic in connection with 
the juvenile court in the city of St. Louis. Recent though these 
clinics are, they have been an important source of stimulation for 
some of the most significant events which have taken place in the 
field of mental hygiene. Demonstration clinics were undertaken in 
Dallas, Monmouth County, New Jersey, Minneapolis, Cleveland, Los 
Angeles and Philadelphia. The influence of these clinics spread 
widely, so that in a relatively short number of years clinics modelled 
after these existed in most large cities in the United States as well as 
in many smaller ones. 

Every demonstration clinic was in a sense a new venture for the 
Division on Community Clinics of the National Committee for Mental 
Hygiene. Out of these ventures and experiences the leaders in the 
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field were in a position to gain new insights for future undertakings. 
Of the many leaders in the field none was in a more favorable position 
to obtain a full, sound and rich picture of the child guidance patterns 
as they emerged than Dr. George S. Stevenson, director of the Division 
on Community Clinics. In the book Child Guidance Clinics Dr. 
Stevenson with the help of Geddes Smith traces the genesis and 
development of the child guidance movement, describes the workings 
of present day clinics, analyzes their services and procedures, and 
in the light of this analysis discusses trends and possibilities. 

In his chapter on “Pioneer Clinics,” Dr. Stevenson mentions 
the fact that Lightner Witmer established a psychological clinic at 
the University of Pennsylvania in 1896, but traces the modern child 
guidance movement back to the organization of the Chicago Juvenile 
Psychopathic Institute in 1909 under the direction of Dr. William 
Healy. His expressed reason for doing so was that Dr. Witmer’s 
clinic was primarily interested in serving educational institutions, 
whereas Dr. Healy’s procedure both in approaches and in predominant 
interest served as a model for later clinic organizations. Dr. Healy’s 
approach included both the medical and the psychological angle, and 
his interest was primarily in delinquency. Other institutions men- 
tioned in this chapter are the Judge Baker Foundation, the Boston 
Psychopathic Hospital and the Henry Phipps Psychiatric Clinic. 
Other personalities mentioned include Dr. Adolph Meyer, Dr. Augusta 
Bronner, Dr. Thomas W. Salmon and Dr. Herman Adler. 

The three-fold functions of the child guidance clinic as given in 
this book are: Study and treatment of patients; interesting other 
community agencies in the prevention of behavior and personality 
disorders in children and in promising methods of dealing with them 
when they occur; and attempting to reveal to the community, through 
the first-hand study of individual children, the unmet needs of groups 
of children. 

In their evaluating comments, the authors make no pretentious 
comments. In their last chapter on ‘‘Trends and Possibilities’ they 
do claim that “clinical service for child guidance gives effect, on a 
limited scale, to the best current thinking about the way to prevent 
delinquency and mental disease.”” Throughout the book, however, 
they indicate a clear perception of the possibility as well as probable 
desirability of change in organization, methods and manner of inte- 
gration. Their claims as well as general outlook are well described 
in the last paragraph of the book, which reads as follows: 
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“The child guidance clinic is more than a therapeutic agency. 
It is a tool for synthesizing the most promising approaches to problems 
of behavior and personality in childhood, and for demonstrating the 
synthesis of the professions concerned with these problems. It is a 
laboratory in which new leads may be found for the study of the child. 
As such, it has a place in social evolution.” H. MELTZER. 
Psychological Service Center, St. Louis. 


HELEN M. Waker. Mathematics Essential for Elementary Statistics. 
New York: Henry Holt & Co., 1934, pp. VII + 246. 


Since social science majors in college seldom include mathematics 
among their electives, and no social science curriculum requires it, 
students of psychology, education, sociology, and the like are almost 
universally deficient in their command of the arithmetic and algebra 
encountered in even an introductory course in statistics. 

Miss Walker’s book provides material for one method of remedying 
this situation. Its thirty-one chapters are devoted to such topics 
as square root, algebraic symbolism, significant figures, graphs, 
logarithms, etc., and the presentation is carefully planned to aid the 
student in mastering the material. With the exception of a few 
chapters, each topic is introduced by a pre-test which enables the 
student to judge his present need for a study of the chapter’s content, 
and at the end of the section is another test by means of which the 
learner may check up on his grasp of the material studied. 

Throughout the book, emphasis is laid upon those mathematical 
principles and processes which the educational statistician is most apt 
to encounter, and a preponderance of space is devoted to those con- 
cepts which usually cause trouble for the beginning statistics student. 
Thus “Significant Figures,’ “Placing a Decimal Point,” ‘“ Factoring 
and Summation,” and the like have whole sections to themselves, while 
only a dozen pages are devoted to the whole group of concepts, 
“Variable, Unknown, Parameter, Function,’ and a half-dozen to 
“Fitting a Straight Line to a Swarm of Points.” 

That Miss Walker has acted wisely in this choice of emphasis will 
be recognized by every teacher of a first course in educational statis- 
tics. Such teachers will recognize also the need for reviewing, as the 
book does, some arithmetical material whose mastery is commonly 
supposed to have been achieved in grammar school, and they will 
seize upon the volume as a welcome aid in helping their pupils to 
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master the swarm of mathematical minutiae which becloud the first 


approach to educational statistics. 


Indeed one might almost say that the obvious excellence of this 
study manual constitutes its greatest danger. The typical social 
science student finishes his introductory course in statistics with 
the same impression he had gleaned from other students before he 
entered it; namely, that statistics consists largely of technical terms 
and computation. Since relatively few students go on to advanced 
work, the majority fail ever to grasp the larger and more fundamental 
concepts of statistics. Only to the occasional advanced student is it 
given to see statistical inference as a practical system of logic to which 
computation and nomenclature are merely efficient handmaidens. 

Miss Walker’s book will make it so much easier to “put across” 
elementary statistics as now taught—distributions, central tendency, 
dispersion, correlation, and so on—that its availability constitutes 
a real menace to the coming generation of beginning students. Hap- 
pily, however, those teachers who do insist that their students master 
fundamental ideas before merely ancillary details will find this study 
manual of great assistance when computational work is taken up in its 
proper place. P. J. RuLon. 

Harvard University. 
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